gongda0e / FUTR

Future Transformer for Long-term Action Anticipation (CVPR 2022)
Other
47 stars 4 forks source link

The inconsistency between the result that I reproduced and the result reported in the paper #7

Open Prism-hua opened 9 months ago

Prism-hua commented 9 months ago

I have finished the training and testing of the FUTR model, using the code provided in this repo. Consequently, I found that the results I received were lower than that reported in the paper. image

I am wondering if there is any mistake I have made.

Prism-hua commented 9 months ago

Here is the result reported in the paper: image

gongda0e commented 8 months ago

Thank you for reaching out and expressing your concerns about reproducing the performance of our paper. It appears that the performance you achieved using our code is significantly lower than reported in the original paper. To better assist you, could you please share more details about the settings you used? Specifically, did you employ the original code and environment as specified in the paper?

Prism-hua commented 8 months ago

Thank you for reaching out and expressing your concerns about reproducing the performance of our paper. It appears that the performance you achieved using our code is significantly lower than reported in the original paper. To better assist you, could you please share more details about the settings you used? Specifically, did you employ the original code and environment as specified in the paper?

I sincerely appreciate your response to my question. I would like to provide some additional information regarding my experimental setup and the code used.

I conducted my experiments primarily based on the information provided in this repo. I followed the contents in the ”futr.yaml“ to set up the environment completely. The code remained unchanged, with the only modification being the adjustments of the "DataParallel" code intended for enabling parallel training across only 2 GPUs. When training on dataset breakfast, I used the scripts available in this repo without change, but on 50 Salads, I changed the epoch in the script from 70 to 60, batch_size from 8 to 16, for reasons that these are recommended in the paper. One possible reason that may lead to my results not matching the paper could be the use of different checkpoints. All my results are based on the checkpoint 59 obtained from my own training, as I encountered difficulties using the checkpoints provided in the readme file for inference.

Here is the error I encountered when using the checkpoints provided in the readme file: RuntimeError: Error(s) in loading state_dict for DataParallel: Missing key(s) in state_dict: "module.pos_embedding", "module.input_embed.weight", "module.input_embed.bias", "module.transformer.encoder.layers.0.linear1.weight", "module.transformer.encoder.layers.0.linear1.bias", "module.transformer.encoder.layers.0.linear2.weight", "module.transformer.encoder.layers.0.linear2.bias", "module.transformer.encoder.layers.0.norm1.weight", "module.transformer.encoder.layers.0.norm1.bias", "module.transformer.encoder.layers.0.norm2.weight", "module.transformer.encoder.layers.0.norm2.bias", "module.transformer.encoder.layers.0.self_attn.in_proj_weight", "module.transformer.encoder.layers.0.self_attn.in_proj_bias", "module.transformer.encoder.layers.0.self_attn.out_proj.weight", "module.transformer.encoder.layers.0.self_attn.out_proj.bias", "module.transformer.encoder.layers.1.linear1.weight", "module.transformer.encoder.layers.1.linear1.bias", "module.transformer.encoder.layers.1.linear2.weight", "module.transformer.encoder.layers.1.linear2.bias", "module.transformer.encoder.layers.1.norm1.weight", "module.transformer.encoder.layers.1.norm1.bias", "module.transformer.encoder.layers.1.norm2.weight", "module.transformer.encoder.layers.1.norm2.bias", "module.transformer.encoder.layers.1.self_attn.in_proj_weight", "module.transformer.encoder.layers.1.self_attn.in_proj_bias", "module.transformer.encoder.layers.1.self_attn.out_proj.weight", "module.transformer.encoder.layers.1.self_attn.out_proj.bias", "module.transformer.decoder.layers.0.linear1.weight", "module.transformer.decoder.layers.0.linear1.bias", "module.transformer.decoder.layers.0.linear2.weight", "module.transformer.decoder.layers.0.linear2.bias", "module.transformer.decoder.layers.0.norm1.weight", "module.transformer.decoder.layers.0.norm1.bias", "module.transformer.decoder.layers.0.norm2.weight", "module.transformer.decoder.layers.0.norm2.bias", "module.transformer.decoder.layers.0.norm3.weight", "module.transformer.decoder.layers.0.norm3.bias", "module.transformer.decoder.layers.0.self_attn.in_proj_weight", "module.transformer.decoder.layers.0.self_attn.in_proj_bias", "module.transformer.decoder.layers.0.self_attn.out_proj.weight", "module.transformer.decoder.layers.0.self_attn.out_proj.bias", "module.transformer.decoder.layers.0.multihead_attn.in_proj_weight", "module.transformer.decoder.layers.0.multihead_attn.in_proj_bias", "module.transformer.decoder.layers.0.multihead_attn.out_proj.weight", "module.transformer.decoder.layers.0.multihead_attn.out_proj.bias", "module.transformer.decoder.layers.1.linear1.weight", "module.transformer.decoder.layers.1.linear1.bias", "module.transformer.decoder.layers.1.linear2.weight", "module.transformer.decoder.layers.1.linear2.bias", "module.transformer.decoder.layers.1.norm1.weight", "module.transformer.decoder.layers.1.norm1.bias", "module.transformer.decoder.layers.1.norm2.weight", "module.transformer.decoder.layers.1.norm2.bias", "module.transformer.decoder.layers.1.norm3.weight", "module.transformer.decoder.layers.1.norm3.bias", "module.transformer.decoder.layers.1.self_attn.in_proj_weight", "module.transformer.decoder.layers.1.self_attn.in_proj_bias", "module.transformer.decoder.layers.1.self_attn.out_proj.weight", "module.transformer.decoder.layers.1.self_attn.out_proj.bias", "module.transformer.decoder.layers.1.multihead_attn.in_proj_weight", "module.transformer.decoder.layers.1.multihead_attn.in_proj_bias", "module.transformer.decoder.layers.1.multihead_attn.out_proj.weight", "module.transformer.decoder.layers.1.multihead_attn.out_proj.bias", "module.transformer.decoder.norm.weight", "module.transformer.decoder.norm.bias", "module.query_embed.weight", "module.fc_seg.weight", "module.fc_seg.bias", "module.fc.weight", "module.fc.bias", "module.fc_len.weight", "module.fc_len.bias", "module.pos_enc.pos_table". Unexpected key(s) in state_dict: "pos_embedding", "input_embed.weight", "input_embed.bias", "transformer.encoder.layers.0.linear1.weight", "transformer.encoder.layers.0.linear1.bias", "transformer.encoder.layers.0.linear2.weight", "transformer.encoder.layers.0.linear2.bias", "transformer.encoder.layers.0.norm1.weight", "transformer.encoder.layers.0.norm1.bias", "transformer.encoder.layers.0.norm2.weight", "transformer.encoder.layers.0.norm2.bias", "transformer.encoder.layers.0.self_attn.in_proj_weight", "transformer.encoder.layers.0.self_attn.in_proj_bias", "transformer.encoder.layers.0.self_attn.out_proj.weight", "transformer.encoder.layers.0.self_attn.out_proj.bias", "transformer.encoder.layers.1.linear1.weight", "transformer.encoder.layers.1.linear1.bias", "transformer.encoder.layers.1.linear2.weight", "transformer.encoder.layers.1.linear2.bias", "transformer.encoder.layers.1.norm1.weight", "transformer.encoder.layers.1.norm1.bias", "transformer.encoder.layers.1.norm2.weight", "transformer.encoder.layers.1.norm2.bias", "transformer.encoder.layers.1.self_attn.in_proj_weight", "transformer.encoder.layers.1.self_attn.in_proj_bias", "transformer.encoder.layers.1.self_attn.out_proj.weight", "transformer.encoder.layers.1.self_attn.out_proj.bias", "transformer.decoder.layers.0.linear1.weight", "transformer.decoder.layers.0.linear1.bias", "transformer.decoder.layers.0.linear2.weight", "transformer.decoder.layers.0.linear2.bias", "transformer.decoder.layers.0.norm1.weight", "transformer.decoder.layers.0.norm1.bias", "transformer.decoder.layers.0.norm2.weight", "transformer.decoder.layers.0.norm2.bias", "transformer.decoder.layers.0.norm3.weight", "transformer.decoder.layers.0.norm3.bias", "transformer.decoder.layers.0.self_attn.in_proj_weight", "transformer.decoder.layers.0.self_attn.in_proj_bias", "transformer.decoder.layers.0.self_attn.out_proj.weight", "transformer.decoder.layers.0.self_attn.out_proj.bias", "transformer.decoder.layers.0.multihead_attn.in_proj_weight", "transformer.decoder.layers.0.multihead_attn.in_proj_bias", "transformer.decoder.layers.0.multihead_attn.out_proj.weight", "transformer.decoder.layers.0.multihead_attn.out_proj.bias", "transformer.decoder.layers.1.linear1.weight", "transformer.decoder.layers.1.linear1.bias", "transformer.decoder.layers.1.linear2.weight", "transformer.decoder.layers.1.linear2.bias", "transformer.decoder.layers.1.norm1.weight", "transformer.decoder.layers.1.norm1.bias", "transformer.decoder.layers.1.norm2.weight", "transformer.decoder.layers.1.norm2.bias", "transformer.decoder.layers.1.norm3.weight", "transformer.decoder.layers.1.norm3.bias", "transformer.decoder.layers.1.self_attn.in_proj_weight", "transformer.decoder.layers.1.self_attn.in_proj_bias", "transformer.decoder.layers.1.self_attn.out_proj.weight", "transformer.decoder.layers.1.self_attn.out_proj.bias", "transformer.decoder.layers.1.multihead_attn.in_proj_weight", "transformer.decoder.layers.1.multihead_attn.in_proj_bias", "transformer.decoder.layers.1.multihead_attn.out_proj.weight", "transformer.decoder.layers.1.multihead_attn.out_proj.bias", "transformer.decoder.norm.weight", "transformer.decoder.norm.bias", "query_embed.weight", "fc_seg.weight", "fc_seg.bias", "fc.weight", "fc.bias", "fc_len.weight", "fc_len.bias", "pos_enc.pos_table". /etc/profile.d/kde.sh: line 25: [: argument expected

alberto-mate commented 7 months ago

Hi @gongda0e and @Prism-hua, I had the same issue for using the checkpoints provided in the README.md file. What solved my problem was letting the "DataParallel" as it was in the original code. Hope this helps :)

In terms of reproducing the results, I am also having problems and in particular with the dataset 50salads. Is it true that with the checkpoints provided in the README.md and using 50s_predict.sh I can reproduce the results for inference.

However, training the model from scratch is the main problem. Despite following the original code closely, the performance falls short of the paper's results by around 10%. (I get similar results as here). It's possible there are additional training procedures missing from the Github code, such as selecting the best checkpoint or specific optimization techniques.

@gongda0e, any insights you can share on how you determined the best checkpoint or other potentially missing training details would be greatly appreciated.

Prism-hua commented 6 months ago

Suggestion Accepted, Thanks @alberto-mate . It seems that your idea of "reverting the DataParallel module to its original form" could address my issue. However, due to my current insufficient computing resources, I am still encountering "cuda out of memory" problem. Please allow me to further explore this approach and then discuss the results with you later.

Prism-hua commented 6 months ago

Hi @gongda0e and @alberto-mate , Glad to express that thanks to the help from @alberto-mate , I have completed the inference task using the checkpoints provided in the readme file. The results obtained are as follows, and it can be observed that they generally achieve the performance mentioned in the paper. 5

However, I encountered the following issue during inference on split1 of 50Salads, which seems to indicate a problem with the provided checkpoint. As a consequence, I did not take the results on split1 into account.

Errors: Traceback (most recent call last): File "main.py", line 142, in main() File "main.py", line 127, in main predict(model, video_test_list, args, obs_p, n_class, actions_dict, device) File "/data/hh/FUTR/predict.py", line 42, in predict features = np.load(features_file).transpose() File "/home/hh/.conda/envs/FUTR/lib/python3.8/site-packages/numpy/lib/npyio.py", line 432, in load return format.read_array(fid, allow_pickle=allow_pickle, File "/home/hh/.conda/envs/FUTR/lib/python3.8/site-packages/numpy/lib/format.py", line 829, in read_array array.shape = shape[::-1] ValueError: cannot reshape array of size 5034208 into shape (11679,2048)

alberto-mate commented 6 months ago

Hi @Prism-hua,

I'm glad I could be of assistance! That's great to hear you were able to get similar results using the checkpoints like in the paper.

Just to confirm, I was also able to replicate the results you mentioned, and interestingly, I didn't encounter the issue with the first split of 50Salads.

However, there are definitely some key differences between training a model from scratch and using a checkpoint. Hope that @gongda0e can give us some insights.

gongda0e commented 6 months ago

Hello, thank you for your interest in our work. As requested by @alberto-mate , I will be updating the code to identify the best checkpoint during training. Just to clarify, I strictly adhere to the training recipes provided in the main repository and do not incorporate any additional ones. Given the variability in GPU settings, I recommend exploring alternative hyperparameters to optimize performance in your environment.

@Prism-hua I will also check again with the checkpoint of the first split of 50 Salads if it does not work.