Closed davidingram123 closed 1 month ago
Hi @davidingram123 Did you use config ./LRW/video/config/bert-12l-512d_LRW_96_bf16_rrc_WB.yaml?
@snoop2head yes
@snoop2head In the YAML file you just mentioned, use_word_boundary: false, is that incorrect? I changed it to true when running the inference file.
I think you mixed up the bert-12l-512d_LRW_96_bf16_rrc_WB.yaml and bert-12l-512d_LRW_96_bf16_rrc_noWB.yaml, so I directly swapped the names of the two files.
@davidingram123
I think mine still works!
wandb: Run data is saved locally in ./wandb/run-20241023_034217-1vpf8nhg
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run vq-transformer_lambda10_bf16_rrc_TimeMaskFixed
wandb: βοΈ View project at https://wandb.ai/quoqa-nlp/cross-modal-sync
wandb: π View run at https://wandb.ai/quoqa-nlp/cross-modal-sync/runs/1vpf8nhg
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
using x-transformers bert implementation
using x-transformers bert implementation
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------
You are using a CUDA device ('NVIDIA H100 80GB HBM3') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing DataLoader 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 261/261 [00:14<00:00, 18.48it/s]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Test metric DataLoader 0
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
test/accuracy_top1 0.949720025062561
test/accuracy_top5 0.9932799935340881
test/loss_category 0.20327399671077728
test/loss_total 0.20327399671077728
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Updated the config according to your findings about use_word_boundary
@davidingram123 I refactored the code in ed6b97885f321781fd19d1ab4a2d2f9a17768226 commit with the following changes:
@snoop2head Hello, I have generated the npy and pkl files again using your code, and rerun the inference code, but the effect remains the same, identical. I canβt figure out why this is happening. What do you think could be the issue? I generated the corresponding pkl file according to your code, and both times the generation was the same. If itβs not a problem with the pkl file, then what else could it be?
@davidingram123 Hm... I am not sure for now. What about training and validation accuracy?
@snoop2head I didnβt train or validate; I just performed inference using your bert-12l-512d_LRW_96_bf16_rrc_WB.yaml.
@davidingram123 Yeah, I do understand that. I asked train/val accuracy to check the sanity of preprocessing procedure by observing its intermediary metrics from training. Loss and validation accuracy should be similar to what we've covered in issue #14.
@snoop2head Sorry, I havenβt run the program yet. I will let it run for a day or two and then get back to you with the details on the training and validation accuracy. Thank you for your help.
@davidingram123 No problems! I will double check preprocessing procedure in meantime
@snoop2head Hello, Iβve been running the training code for some time now, and it seems to be performing well, but it doesnβt seem to reach the level reported in the paper. Here is the link: wandb.
Additionally, Iβm curious why I canβt achieve the results you demonstrated using the checkpoint weight-audio-v1/LRW_CKPT_epoch_167_step_213864.ckpt.
Finally, I also have a question about #11. Could you reopen it? I would like to ask some related questions.
@davidingram123 I reopened the issue! can you make the wandb log public so that I can get access to?
@snoop2head wandbsorry
@davidingram123 It turns out that I've uploaded wrong version of preprocessing code, similar to the cause of Issue #16 I've double checked the commit da5055ec7f367d4813d65d3daa2cbb1222e5cfc4 and please do the following procedures:
git pull
to reflect the updatespython preprocess_pkl.py
to override the previous pkl files with newly cropped pkl files. python inference.py ./config/bert-12l-512d_LRW_96_bf16_rrc_WB.yaml
and you will get the result as below:βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Test metric DataLoader 0
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
test/accuracy_top1 0.948199987411499
test/accuracy_top5 0.993120014667511
test/loss_audio 3.2306928634643555
test/loss_category 0.20880813896656036
test/loss_total 32.5157470703125
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
I am pretty sure that this fix will also resolve train / validation performance issue. Thank you for the feedback!
Here's a simple visualization before the change and after the change in da5055ec7f367d4813d65d3daa2cbb1222e5cfc4 for your reference.
@snoop2head Thank you for your help; I got the same result.
@davidingram123 Thank you as well! Without your feedback, I wouldn't have discovered those issues.
Hello, I tried to use the LRW_CKPT_epoch_167_step_213864.ckpt you provided for inference, but I only achieved a top-1 accuracy of 70%. What could be the issue? I am currently regenerating the npy and pkl files. I randomly checked over a dozen images, and I think they seem fine. However, it appears that the LRW_CKPT_epoch_167_step_213864.ckpt is not able to reach the performance stated in your paper. I'm not sure if the issue is with the pkl files. Do you have any good methods to check if the pkl files are correct? What do you think is the reason that I am unable to achieve the performance mentioned in the paper using LRW_CKPT_epoch_167_step_213864.ckpt?