Open SUMIN080 opened 2 days ago
Dear @SUMIN080 ,
Thank you for filing an issue! Even if I double checked it myself, I was wondering if it is replicated in other people's development environment.
To begin with, you don't need to use audio as inputs (nor as targets) when inferencing. Audio is for training purpose only.
There are several questions to ask, which will be helpful for us to comprehend your situation.
preprocess_roi.py
first and then run preprocess_pkl.py
?Please refer to our inference log on the test set, based on the trained model with run name i9umrm1x mentioned in Issue #14.
1 Using 16bit native Automatic Mixed Precision (AMP)
2 GPU available: True (cuda), used: True
3 TPU available: False, using: 0 TPU cores
4 IPU available: False, using: 0 IPUs
5 HPU available: False, using: 0 HPUs
6 `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
7 Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
8 using vq neural audio codec
9 using x-transformers bert implementation
10 ----------------------------------------------------------------------------------------------------
11 distributed_backend=nccl
12 All distributed processes registered. Starting with 4 processes
13 ----------------------------------------------------------------------------------------------------
14 You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
15 Restoring states from the checkpoint path at /root/CMTS-VSR/cross-modal-sync/i9umrm1x/checkpoints/epoch=167-step=213864.ckpt
16 Testing DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66/66 [00:45<00:00, 1.45it/s]
17 LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
18 Loaded model weights from checkpoint at /root/CMTS-VSR/cross-modal-sync/i9umrm1x/checkpoints/epoch=167-step=213864.ckpt
19 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
20 Test metric DataLoader 0
21 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
22 test/accuracy_top1 0.9498400092124939
23 test/accuracy_top5 0.9933199882507324
24 test/loss_category 0.20326192677021027
25 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
I have two types of files:
When I ran ./LRW/src/inference.py using the second .pkl file, the following results were produced:
──────────────────────────────────────────────────────────────── test/accuracy_top1 0.0 test/accuracy_top5 0.0 test/loss_audio 5.454221248626709 test/loss_category 9.774922370910645 test/loss_total 64.317138671875 ────────────────────────────────────────────────────────────────
The results seem incorrect, and I did not use the audio-token .plk file in this process. I’m wondering where and how the audio-token file should be used.