IndexError in DataLoader Worker Process with Custom Dataset

yulrio commented 3 months ago

Hello,

I'm currently using your code from the repository [insert repository name] with my own dataset, but I'm encountering an IndexError during the training phase. Below is the traceback I received:

[ Fri Aug 16 10:18:36 2024 ] Parameters: {'work_dir': './work_dir/baseline_res18/', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '3', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'sclite', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'QSLR2024', 'dataset_info': {'dataset_root': './dataset/QSLR2024', 'dict_path': './preprocess/QSLR2024/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'QSLR2024-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/QSLR2024', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 2, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 30}

0%| | 0/162 [00:00<?, ?it/s] Traceback (most recent call last): File "/raid/data/m33221012/VAC_CSLR_QSLR/main.py", line 213, in processor.start() File "/raid/data/m33221012/VAC_CSLR_QSLR/main.py", line 44, in start seq_train(self.data_loader['train'], self.model, self.optimizer, File "/raid/data/m33221012/VAC_CSLR_QSLR/seq_scripts.py", line 18, in seq_train for batch_idx, data in enumerate(tqdm(loader)): File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/tqdm/std.py", line 1181, in iter for obj in iterable: File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in next data = self._next_data() File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data return self._process_data(data) File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data data.reraise() File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/_utils.py", line 706, in reraise raise exception IndexError: Caught IndexError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop data = fetcher.fetch(index) # type: ignore[possibly-undefined] File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/m33221012/miniconda3/envs/py31012/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 48, in getitem input_data, label = self.normalize(input_data, label) File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 80, in normalize video, label = self.data_aug(video, label, file_id) File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 24, in call image = t(image) File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 119, in call if isinstance(clip[0], np.ndarray): IndexError: list index out of range

It seems the issue occurs within the video_augmentation.py script when accessing clip[0]. I suspect it might be related to the data augmentation process or the input data structure.

Since I'm using my own dataset, could you please let me know what specific adjustments or preprocessing steps are necessary to ensure compatibility with your code? Additionally, is there a possibility that this error is related to hardware settings, such as GPU configuration or memory limitations?

Any advice on how to resolve this error and properly integrate my dataset would be greatly appreciated.

Thank you in advance for your help!

RafaelAmauri commented 1 month ago

Did you run the preprocessing script on your training data before training? I was having this issue too when using a custom dataset, but after running the pre-processing script it worked out fine.

yulrio commented 1 month ago

Thank you for replying to my question. May I know the configuration of the .yaml file? Thanks in advance.

RafaelAmauri commented 1 month ago

I am using the default values. I haven't changed any configs

Onestringlab commented 1 month ago

I just ran the following command:

!python main.py --load-weights resnet18_baseline_dev_23.80_epoch25_model.pt --phase test --device 0

and got the following result:

Loading model finished.
Loading data
train 5671
Apply training transform.

train 5671
Apply testing transform.

dev 540
Apply testing transform.

test 629
Apply testing transform.

Loading data finished.
Working tree is dirty. Patch:
diff --git a/.gitignore b/.gitignore
old mode 100755
new mode 100644

[ Tue Oct  8 22:35:41 2024 ] Model: slr_network.SLRModel.
[ Tue Oct  8 22:35:42 2024 ] Weights: /content/drive/MyDrive/MyResearch/pretrain/resnet18_baseline_dev_23.80_epoch25_model.pt.
100% 68/68 [1:09:24<00:00, 61.24s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-dev-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Tue Oct 8 11:45:07 PM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Tue Oct  8 23:45:07 2024 ] Epoch 6667, dev 100.00%
100% 79/79 [1:15:47<00:00, 57.56s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-test-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Wed Oct 9 01:00:55 AM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Wed Oct  9 01:00:55 2024 ] Epoch 6667, test 100.00%
[ Wed Oct  9 01:00:55 2024 ] Evaluation Done.

Can you explain why the error Unexpected error: <class 'AttributeError'> occurred and which part of the code needs to be corrected?

Also, why did I get 100% for both dev and test?

Thanks in advance!

RafaelAmauri commented 2 weeks ago

File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 48, in getitem input_data, label = self.normalize(input_data, label) File "/raid/data/m33221012/VAC_CSLR_QSLR/dataset/dataloader_video.py", line 80, in normalize video, label = self.data_aug(video, label, file_id) File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 24, in call image = t(image) File "/raid/data/m33221012/VAC_CSLR_QSLR/utils/video_augmentation.py", line 119, in call if isinstance(clip[0], np.ndarray): IndexError: list index out of range

Just in case anyone else runs into this, this error happens because the dataloader couldn't load the dataset for whatever reason. I just had this error again because inside my dataset I had it like this: dataset/features/train,test,dev. I forgot to add the 'fullFrame-256x256px' folder right after features, and because of that the dataloader wasn't able to find the train/test/dev folders. It is hard-coded to look specifically for a fullFrame-256x256px folder, and when it couldn't find one, nothing was loaded.

This is to say, make sure that the structure inside your custom dataset is 100% similar to the one found inside phoenix2014. Any changes could break the training script.

RafaelAmauri commented 2 weeks ago

I just ran the following command:

!python main.py --load-weights resnet18_baseline_dev_23.80_epoch25_model.pt --phase test --device 0

and got the following result:

Loading model finished.
Loading data
train 5671
Apply training transform.

train 5671
Apply testing transform.

dev 540
Apply testing transform.

test 629
Apply testing transform.

Loading data finished.
Working tree is dirty. Patch:
diff --git a/.gitignore b/.gitignore
old mode 100755
new mode 100644

[ Tue Oct  8 22:35:41 2024 ] Model: slr_network.SLRModel.
[ Tue Oct  8 22:35:42 2024 ] Weights: /content/drive/MyDrive/MyResearch/pretrain/resnet18_baseline_dev_23.80_epoch25_model.pt.
100% 68/68 [1:09:24<00:00, 61.24s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-dev-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Tue Oct 8 11:45:07 PM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Tue Oct  8 23:45:07 2024 ] Epoch 6667, dev 100.00%
100% 79/79 [1:15:47<00:00, 57.56s/it]
/content/drive/MyDrive/MyResearch/VAC_CSLR_ORI_OSL
preprocess.sh ./work_dir/baseline_res18/output-hypothesis-test-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm
Wed Oct 9 01:00:55 AM UTC 2024
Preprocess Finished.
Unexpected error: <class 'AttributeError'>
[ Wed Oct  9 01:00:55 2024 ] Epoch 6667, test 100.00%
[ Wed Oct  9 01:00:55 2024 ] Evaluation Done.

Can you explain why the error Unexpected error: <class 'AttributeError'> occurred and which part of the code needs to be corrected?

Also, why did I get 100% for both dev and test?

Thanks in advance!

I don't know how to fix the AttributeError, but getting 100% WER on the dev and test splits happens because you need to have an 'evaluation' folder in the folder where the main code for VAC is. Inside this evaluation folder you need to have the .stm files with the groundtruth for the dev and test splits.

Luckily, the preprocessing step generates these automatically. After you run the preprocessing step, you should see a new folder created inside the preprocess folder with the name of your dataset. There you will find the .stm files with the groundtruth.

The phoenix dataset comes with this evaluation folder by default with a bunch of different files, not only the .stm files, so I don't know if it's only the .stms that you need or if you need the rest too. What I did was copy the entire 'evaluation' folder from phoenix and just replaced the .stms that come with phoenix with the ones generated by the preprocessing script for my custom dataset.

Good luck!

Onestringlab commented 2 weeks ago

Thank you for the answer.

Could you let me know which version of PyTorch you used for these experiments?

Thanks again!

RafaelAmauri commented 2 weeks ago

Thank you for the answer.

Could you let me know which version of PyTorch you used for these experiments?

Thanks again!

I'm using python 3.8.10 and pytorch 1.13.1

VIPL-SLP / VAC_CSLR

IndexError in DataLoader Worker Process with Custom Dataset #48