Open ZachWong123 opened 11 months ago
Here are the code for accuracy: https://github.com/OpenGVLab/UniFormerV2/blob/722a43440fc5b9662cc2a8f23b86caa205e45ebc/slowfast/utils/metrics.py#L9-L64
The below code is modified by GPT, please try it
def mean_recall(preds, labels, num_classes):
"""
Calculate the mean recall given predictions and labels.
Args:
preds (Tensor): Predictions from the model. Dimension is N x ClassNum.
labels (Tensor): True labels. Dimension is N.
num_classes (int): Number of classes.
Returns:
mean_recall (float): The mean recall over all classes.
"""
assert preds.size(0) == labels.size(0), "Batch dim of predictions and labels must match"
# Convert predictions to class indices
_, predicted_classes = preds.max(dim=1)
# Initialize TP and FN counters
TP = [0] * num_classes
FN = [0] * num_classes
# Count TP and FN for each class
for i in range(num_classes):
TP[i] = ((predicted_classes == i) & (labels == i)).sum().item()
FN[i] = ((predicted_classes != i) & (labels == i)).sum().item()
# Calculate recall for each class
recalls = [TP[i] / (TP[i] + FN[i]) if (TP[i] + FN[i]) > 0 else 0 for i in range(num_classes)]
# Calculate mean recall
mean_recall = sum(recalls) / num_classes
return mean_recall
Thank you for your response!I will try it later.
"Hello, may I ask if you could provide the link to BaiduNetdisk again? The dataset I obtained does not match the .csv annotations you provided as the same.The one in the picture is no longer accessible.The code is wrong.
链接: https://pan.baidu.com/s/1T8omLX_HE88CdbFpVupalA 密码: f4vw
链接: https://pan.baidu.com/s/1T8omLX_HE88CdbFpVupalA 密码: f4vw
非常感谢您的回复,目前我已经在您的模型上测试了k400_b16_f8x224模型在k400数据集上的表现,得到了83.38的结果,跟您给出的点数基本一致,于是我想要去测试一下您的k400+k710_l14_f64x336模型,但是测试在我的单卡3090上好像没办法启动,请问您能给我一些配置文件的修改意见么,我将贴出我现在使用的test.sh和config.yaml test.sh NUM_SHARDS=1 NUM_GPUS=1 BATCH_SIZE=16 BASE_LR=1.5e-5 PYTHONPATH=$PYTHONPATH:./slowfast \ python tools/run_net_multi_node.py \ --init_method tcp://localhost:10125 \ --cfg ./exp/k400/k400+k710_l14_f64x336/config.yaml \ --num_shards $NUM_SHARDS \ DATA.PATH_TO_DATA_DIR /root/data/UniformerV2/new_k400/kinetics_400/videos_320 \ DATA.PATH_PREFIX /root/data/UniformerV2/new_k400/kinetics_400/videos_320 \ DATA.PATH_LABEL_SEPARATOR "," \ TRAIN.EVAL_PERIOD 1 \ TRAIN.CHECKPOINT_PERIOD 100 \ TRAIN.BATCH_SIZE $BATCH_SIZE \ TRAIN.SAVE_LATEST False \ NUM_GPUS $NUM_GPUS \ NUM_SHARDS $NUM_SHARDS \ SOLVER.MAX_EPOCH 5 \ SOLVER.BASE_LR $BASE_LR \ SOLVER.BASE_LR_SCALE_NUM_SHARDS False \ SOLVER.WARMUP_EPOCHS 1. \ TRAIN.ENABLE False \ TEST.NUM_ENSEMBLE_VIEWS 2 \ TEST.NUM_SPATIAL_CROPS 3 \ TEST.TEST_BEST True \ TEST.ADD_SOFTMAX True \ TEST.BATCH_SIZE 4 \ RNG_SEED 6666 \ OUTPUT_DIR .
config.yaml TRAIN: ENABLE: True DATASET: kinetics_sparse BATCH_SIZE: 256 EVAL_PERIOD: 1 CHECKPOINT_PERIOD: 5 AUTO_RESUME: True DATA: USE_OFFSET_SAMPLING: True DECODING_BACKEND: decord NUM_FRAMES: 64 SAMPLING_RATE: 16 TRAIN_JITTER_SCALES: [384, 480] TRAIN_CROP_SIZE: 336 TEST_CROP_SIZE: 336 INPUT_CHANNEL_NUM: [3]
TRAIN_JITTER_SCALES_RELATIVE: [0.08, 1.0] TRAIN_JITTER_ASPECT_RELATIVE: [0.75, 1.3333] UNIFORMERV2: BACKBONE: 'uniformerv2_l14_336' N_LAYERS: 4 N_DIM: 1024 N_HEAD: 16 MLP_FACTOR: 4.0 BACKBONE_DROP_PATH_RATE: 0. DROP_PATH_RATE: 0. MLP_DROPOUT: [0.5, 0.5, 0.5, 0.5] CLS_DROPOUT: 0.5 RETURN_LIST: [20, 21, 22, 23] NO_LMHRA: True TEMPORAL_DOWNSAMPLE: False
DELETE_SPECIAL_HEAD: True AUG: NUM_SAMPLE: 1 ENABLE: True COLOR_JITTER: 0.4 AA_TYPE: rand-m7-n4-mstd0.5-inc1 INTERPOLATION: bicubic RE_PROB: 0. RE_MODE: pixel RE_COUNT: 1 RE_SPLIT: False BN: USE_PRECISE_STATS: False NUM_BATCHES_PRECISE: 200 SOLVER: ZERO_WD_1D_PARAM: True BASE_LR_SCALE_NUM_SHARDS: True BASE_LR: 4e-4 COSINE_AFTER_WARMUP: True COSINE_END_LR: 1e-6 WARMUP_START_LR: 1e-6 WARMUP_EPOCHS: 0. LR_POLICY: cosine MAX_EPOCH: 50 MOMENTUM: 0.9 WEIGHT_DECAY: 0.05 OPTIMIZING_METHOD: adamw COSINE_AFTER_WARMUP: True MODEL: NUM_CLASSES: 400 ARCH: uniformerv2 MODEL_NAME: Uniformerv2 LOSS_FUNC: cross_entropy DROPOUT_RATE: 0.5 USE_CHECKPOINT: True CHECKPOINT_NUM: [24] TEST: ENABLE: True DATASET: kinetics_sparse BATCH_SIZE: 256 NUM_SPATIAL_CROPS: 1 NUM_ENSEMBLE_VIEWS: 1 CHECKPOINT_FILE_PATH: "./exp/k400/k400+k710_l14_f64x336/k400_k710_uniformerv2_l14_64x336.pyth" DATA_LOADER: NUM_WORKERS: 2 PIN_MEMORY: True TENSORBOARD: ENABLE: False NUM_GPUS: 8 NUM_SHARDS: 1 RNG_SEED: 0 OUTPUT_DIR: .
恳切希望得到您的答复,非常感谢
单卡3090,跑64帧应该显存不够,可以跑16帧或者32帧,结果差不多的嘞。你也可以试试我们新的模型,显存开销更小,结果更好
哇 非常感谢作者大大 我回头试试你们的新模型!
单卡3090,跑64帧应该显存不够,可以跑16帧或者32帧,结果差不多的嘞。你也可以试试我们新的模型,显存开销更小,结果更好
作者大大您好 目前我这边想要将模型应用到其他行为标签上 大概可能有100种左右的动作 准备数据是否可以参考k400数据集来做 仿照您提供的annotation files 也就是kinetic_categories.txt train.csv test.csv 来构建自己的行为标签 训练集标注文件和测试集标注文件 然后再在config文件中修改num_class后进行训练和测试呢 此外,我想要测试的其他行为属于比较常见的行为 比如打字、写字、打电话、做饭、撑伞等等 我想要小规模标注一些数据进行微调和测试 应该也能得到比较不错的效果? 恳切希望得到您的答复,非常感谢.
可以的,可以先小规模标一下,然后划分train和val微调K400的checkpoint试试
您好 请问我发现在k400的任务中 LMHRA和T-Down都没有使用到 查看了论文 是因为k400的任务单纯使用global的设计就能达到最佳性能么 如果我要尝试修改模型在k400数据集上提点 是否应该从修改global的设计入手呢 恳切希望得到您的答复
I've observed a common practice in video understanding models and research papers where the recall rate is essentially not provided. Recently, my teacher assigned me a project that requires both accuracy (acc) and recall (AR). When I explained to him that papers usually don't include recall, he suggested running the code and modifying it to output the recall rate. However, I'm uncertain about how to proceed with this. I would greatly appreciate your prompt response. Thank you!