Open Christinepan881 opened 2 years ago
Hi, have you solved this problem already? I also meet this problem.
This is due to torchvision backend during video decoding. Some people mentioned that building torchvision from source solves this issue, however, I haven't been able to fix it yet. This issue already discusses this problem and a possible solution is to change the video decoding backend to PyAV instead. In the YAML config file, you can add:
DATA:
DECODING_BACKEND: pyav
to switch to the PyAV backend. However, PyAV backend introduces another error related to changed data types that is due to a recent commit, so this pull request already solves this problem. I did the necessary changes in the given pull request and now I am able to run the framework with the PyAv backend.
Thanks for playing with pysf. You might get the issue fixed if you preprocess the video to the same format?
Which torch and torchvision version are you using? Thanks
The pull request you mentioned did solve the problem.
I met another problem: the top 1 error( also top5 error) seems not to decrease straightly, in certain epoch, top1 error was 37.5% ,while during some later epoch, top1 error became 50%, and the final accuracy(top 1 acc) is 42.14%(top5 acc: 72.81) which is much smaller than reported in paper, just as follows:
I trained X3D on HMDB51 dataset. Anything wrong with the training code?
I haven't trained on the HMDB51 dataset yet but I am assuming two possibilities:
You are right. Kinetics and AVA datasets are preferred. I referred to other dataset's config file(like Kinetics') and changed it for hmdb51. K400 is still a little larger, training would be much longer. However, I am now working on K400 and choose about 10% for training, which still needs about 3 days.
@alpargun. Hi, I have trained on K400 dataset, while the top1-error and top5-error seems weird.
As shown from the picture above, epoch number is 105, top1 error is still 81.25% in certain batch, while in some other batch it's 56% or 43%. Most of batches during one epoch are nearly 50% but there's always some batch being 80% or 70%, top5 error also vibrates but doesn't show such a trend. Have you met this problem before?
Thanks for playing with pysf. You might get the issue fixed if you preprocess the video to the same format?
I have tried. Even if i preprocess the videos to the same format .mp4, the problem still exists.
Hi, you might find the INSTALL.md file in my SlowFast fork useful for updated installation steps. I would suggest PyTorch <= 1.13.1 as I had similar problems with 2.0.
Following the INSTALL.md file, I suggest installing PyTorch together with TorchVision. I recently set up SlowFast on multiple Ubuntu 20.04 machines and a MacBook following this updated INSTALL.md, and had no problems.
i face same question, torch==2.0.0 ,torchvision==0.15.1, i use Kinetics config and slowfast_8*8_r50.yaml, how can i fix it without lowered torch version?Thanks!
When I use the MViT config to run the code on K400 dataset, I just met the errors: ... Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 3 Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 5 Failed to decode video idx 72108 from /data/k400/train/filling_eyebrows/1m50SSGbG2k_000148_000158.avi; trial 99 Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 15 Failed to decode video idx 139676 from /data/k400/train/playing_paintball/coNWv_D7Fyk_000135_000145.avi; trial 95 Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 6 Failed to decode video idx 205437 from /data/k400/train/taking_a_shower/U540GFOTF6U_000002_000012.avi; trial 99 Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 16 Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 7 Failed to decode video idx 154000 from /data/k400/train/punching_bag/BNwpN8GFixE_000010_000020.avi; trial 0 Failed to decode video idx 139676 from /data/k400/train/playing_paintball/coNWv_D7Fyk_000135_000145.avi; trial 96 Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 4 Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 17 Failed to decode video idx 138602 from /data/k400/train/playing_monopoly/Hn_o3mu9peY_000040_000050.avi; trial 8 Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 5 Failed to decode video idx 86337 from /data/k400/train/headbanging/c6JhdcwPHQU_000002_000012.avi; trial 97 Failed to decode video idx 170537 from /data/k400/train/scuba_diving/dQQK-KSp_pE_000044_000054.avi; trial 18 Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 6 Failed to decode video idx 204993 from /data/k400/train/tai_chi/qV7j-jQCH3M_000027_000037.avi; trial 0 Failed to decode video idx 31483 from /data/k400/train/changing_oil/csJFMaPl9Og_000370_000380.avi; trial 7 Failed to decode video idx 86337 from /data/k400/train/headbanging/c6JhdcwPHQU_000002_000012.avi; trial 98 Traceback (most recent call last): File "tools/run_net.py", line 45, in
main()
File "tools/run_net.py", line 26, in main
launch_job(cfg=cfg, init_method=args.init_method, func=train)
File "/data/home/SlowFast/slowfast/utils/misc.py", line 296, in launch_job
torch.multiprocessing.spawn(
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 2 terminated with the following error: Traceback (most recent call last): File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/data/home/SlowFast/slowfast/utils/multiprocessing.py", line 60, in run ret = func(cfg) File "/data/home/SlowFast/tools/train_net.py", line 708, in train train_epoch( File "/data/home/SlowFast/tools/train_net.py", line 86, in train_epoch for cur_iter, (inputs, labels, index, time, meta) in enumerate( File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/home/miniconda/envs/test0/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/home/SlowFast/slowfast/datasets/kinetics.py", line 488, in getitem
raise RuntimeError(
RuntimeError: Failed to fetch video idx 168596 from /data/k400/train/salsa_dancing/EY6MSW3zkr8_000048_000058.avi; after 99 trials
I have checked with the data paths, and there is no problem with the path.
Anyone know the reason? Thanks!