aimagelab / STAGE_action_detection

Code of the STAGE module for video action detection
49 stars 12 forks source link

Problem with ava_fastercnn_object_train.hdf5 #4

Closed tsminh closed 4 years ago

tsminh commented 4 years ago

Hi. I am trying to follow your repo. But i have a problem when training OSError: Unable to open file (truncated file: eof = 376832, sblock->base_addr = 0, stored_eof = 806224)

I tried to redownload the file twice, but it s still have that error.

Thank you for taking your time.

matteot11 commented 4 years ago

Hi! Thanks for playing with our repo!

Could you show me the detailed log of the error and the line of code where the error occurs? It seems that the file can not be open. Please check from a python console if the following command throws the same error:

import h5py
h5py.File("<path-to-your-file>","r")

If the problem persists, I will download the file and check it, and eventually upload it again. Thanks again.

Matteo

tsminh commented 4 years ago

@matteot11 Thank for your quick response :D

Here is the ouput when i run train.py:

... epoch: 1, iter: 7880/30730, lr: 6.25e-05, class_loss: 0.04936007410287857 epoch: 1, iter: 7900/30730, lr: 6.25e-05, class_loss: 0.058304473757743835 epoch: 1, iter: 7920/30730, lr: 6.25e-05, class_loss: 0.03746988624334335 Traceback (most recent call last): File "train.py", line 105, in main() File "train.py", line 73, in main for iteration, (actors_features, actors_labels, actors_boxes, actors_filenames, objects_features, objects_boxes, objects_filenames, adj) in enumerate(data_loader_train, start_iter): File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in next data = self._next_data() File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data return self._process_data(data) File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise() File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise raise self.exc_type(msg) OSError: Caught OSError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index) File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/minhto/STAGE_action_detection/data/ava_dataset.py", line 30, in getitem hf_actors = h5py.File(filename, 'r') File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/h5py/_hl/files.py", line 312, in init fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/home/minhto/anaconda3/envs/minh/lib/python3.7/site-packages/h5py/_hl/files.py", line 142, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 78, in h5py.h5f.open OSError: Unable to open file (truncated file: eof = 376832, sblock->base_addr = 0, stored_eof = 806224)

I alose run this snippet to check the file, but it seem like there is no error.

import h5py
h5py.File("<path-to-your-file>","r")
matteot11 commented 4 years ago

The problem seems to be at line 30 of ava_dataset.py:

hf_actors = h5py.File(filename, 'r')

This line reads the actors' features from a .hdf5 file (one for each clip), so there could be a corrupted file among them since the training seems to start. Could you please replace line 30 of ava_dataset.py with:

try:
    h5py.File(filename,"r")
except:
    print(filename)

restart the training, and show me which is the file that generates the exception? This will make easier for me to identify the corrupted file. Thanks.

Matteo

tsminh commented 4 years ago

Hi @matteot11 after redownloading the actors_features, it works like a charm 🤣 tbh, i didn't think that the problem is the actors_features_train, because there was no problem when downloading it. But for ava_fasterrcnn_objects_train, i had many problems xD .

Thank you so much for your support !

Minh.

yaru-zhang commented 3 years ago

@tsminh @matteot11 Hi, when I download the feature I meet with a problem as follows.

Sorry, you can 't view or download this file right now. Recently view or download this file too many users. Please try to access this file later. If the file you try to access is particularly large or shared by many people, it may take up to 24 hours to view or download it. If you are still unable to access files 24 hours later, please contact your domain administrator.

Can you share the feature [Faster-RCNN_objects_train]? My email is 340565953@qq.com Thanks in advance.