StevenLauHKHK / AudioInceptionNeXt

Other
2 stars 0 forks source link

OSError: Unable to synchronously open file (invalid file name) #1

Closed JMcarrot closed 4 months ago

JMcarrot commented 5 months ago

Thanks for open-sourcing your code! I have fine-tuning from VGG-Sound pretrained model,but when I run the command to validate the model, the error occurs. The EPICKITCHENS.AUDIO_DATA_FILE parameter is an absolute path and this path can used in train command.

StevenLauHKHK commented 5 months ago

@JMcarrot Thank you for your question. Can you provide the error log so I can check the details? Basically, EPICKITCHENS.AUDIO_DATA_FILE is an absolute path that pointing the dataset source hdf5 file like "/data_ssd/DATA/EPIC-Kitchens-100-hdf5/EPIC-KITCHENS-100_audio.hdf5". You can refer hdf5 dataset generation guideline via the https://github.com/epic-kitchens/epic-sounds-annotations/tree/main/src. I hope this could help you to solve it.

JMcarrot commented 5 months ago

Thanks for your reply! My error log is: `[05/16 06:36:25][INFO] checkpoint.py: 156: Loading network weights from /data/CJM/CVPR/AudioInceptionNeXt-main/output/checkpoints/checkpoint_best.pyth. [05/16 06:36:25][INFO] epicsound.py: 40: Constructing EPIC-SOUND Audio test... [05/16 06:36:26][INFO] epicsound.py: 74: Constructing epicsound dataloader (size: 40175) from ['EPIC_Sounds_validation.pkl'] [05/16 06:36:26][INFO] test_net.py: 181: Testing model for 1256 iterations Process SpawnProcess-2: Traceback (most recent call last): File "", line 1, in File "/opt/conda/envs/audioIN/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/opt/conda/envs/audioIN/lib/python3.8/multiprocessing/spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) _pickle.UnpicklingError: pickle data was truncated Traceback (most recent call last): File "tools/run_net.py", line 30, in main() File "tools/run_net.py", line 26, in main launch_job(cfg=cfg, init_method=args.init_method, func=test) File "/data/CJM/CVPR/AudioInceptionNeXt-main/slowfast/utils/misc.py", line 241, in launch_job torch.multiprocessing.spawn( File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/data/CJM/CVPR/AudioInceptionNeXt-main/slowfast/utils/multiprocessing.py", line 60, in run ret = func(cfg) File "/data/CJM/CVPR/AudioInceptionNeXt-main/tools/test_net.py", line 218, in test test_meter, preds, preds_clips, labels, metadata = perform_test(test_loader, model, test_meter, cfg, writer) File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context return func(args, **kwargs) File "/data/CJM/CVPR/AudioInceptionNeXt-main/tools/test_net.py", line 50, in perform_test for cur_iter, (inputs, labels, audio_idx, meta) in enumerate(test_loader): File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) OSError: Caught OSError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/CJM/CVPR/AudioInceptionNeXt-main/slowfast/datasets/epicsound.py", line 93, in getitem self.audio_dataset = h5py.File(self.cfg.EPICSOUND.AUDIO_DATA_FILE, 'r') File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/h5py/_hl/files.py", line 562, in init fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr) File "/opt/conda/envs/audioIN/lib/python3.8/site-packages/h5py/_hl/files.py", line 235, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 102, in h5py.h5f.open OSError: Unable to synchronously open file (invalid file name) `

and my command is : python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioInceptionNeXt.yaml --init_method tcp://localhost:9997 \ NUM_GPUS 2 \ OUTPUT_DIR /data/CJM/CVPR/AudioInceptionNeXt-main/test \ EPICKITCHENS.AUDIO_DATA_FILE /data/CJM/CVPR/AudioInceptionNeXt-main/EPIC_audio.hdf5 \ EPICKITCHENS.ANNOTATIONS_DIR /data/CJM/CVPR/AudioInceptionNeXt-main/epic-sounds-annotations-main/ \ TRAIN.ENABLE False \ TEST.ENABLE True \ TEST.CHECKPOINT_FILE_PATH /data/CJM/CVPR/AudioInceptionNeXt-main/output/checkpoints/checkpoint_best.pyth

StevenLauHKHK commented 5 months ago

@JMcarrot, I would like to suggest you check two things first. First, I noticed that there is an error coming out with pkl library. You can try reading the test.pkl file and train.pkl file independently to see if any errors occur. Second, you can read the EPIC_audio.hdf5 independently to see if you can read the dataset properly.

JMcarrot commented 5 months ago

I will check it, thank you!!