amm506 commented 2 years ago

First of all, I will apologize if this does not belong here. I am new to ML and Pytorch, so I still have a lot to learn. I am going to paste an image of my stack trace to give a feel for what the error is saying and then do some explanation of what I am seeing on my end.

For starters, this issue appears to only be on one case or very few cases. I say that because when I had shuffle=True in my dataloader, I would sometimes make it almost the whole way through the loop and crash towards the end. Other times it will crash the first few times through the loop. I have since turned shuffle to false to see if I could gain more information from where this was originating, instead of having my problem moving around on me. I would assume it is irrelevant to the problem, but it is on the 6th round through when shuffle is off.

After doing so, I entered debug mode and began to step into functions. After quite a few steps in and looking at values in the debug window, I don't see an issue so far. (There is definitely a possibility that I am missing something.)

If I step over, and not into (in the debugger) the line "for video_batch, labels in train_loader:" takes me out of my training function (where this line is contained) and then ends up catching at the line "if name == 'main': main()" that is in my main.py.

It likely shouldn't make a difference, but I am using the UCF11 dataset. My images, file directories, and text files are all formatted as the documentation states.

It looks like empty values are being returned, but as I was in debug mode, I saw which file directory it was looking into, and verified that the images that were supposed to be there were in that directory. They were.

If there is any other information you would like, please let me know and I'd be glad to post it.

RaivoKoot commented 2 years ago

Hey! Thanks for the extra details and sorry for the delay. I am pretty sure this is an issue with a data sample on your disk. Check out the following:

>>> import numpy as np
>>> np.random.randint(0, size=10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mtrand.pyx", line 748, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1247, in numpy.random._bounded_integers._rand_int64
ValueError: high <= 0

If you call np.random.randint with the first argument high equal to zero, this error will pop up. In VideoFrameDataset internally, as can be seen in your error trace, this is called

np.random.randint(record.num_frames, size=self.num_segments)

This means that record.num_frames is equal to zero in the case of one of your video samples. If your dataloader has shuffle=False, as you say, your error always pops up at the same time, because the video samples are loaded in the same order every time. If you do shuffle=True in your dataloader, your error pops up at a random time, because videos are loaded in a random order.

Basically, if you look at the dataset on your disk, I assume there will be one video folder where there are no RGB frames because you made some mistake in creating them. This has happened to me too before. If you add

print(record.path)

at the start of the function def _sample_indices(self, record): inside of VideoFrameDataset, you will be able to see what video sample is the cause when your program errors out again.

RaivoKoot commented 2 years ago

14

This might also help.

amm506 commented 2 years ago

That was very helpful. Thank you very much. I will close this issue and then reopen it if I am still having issues. Everything appears to be working correctly now. Thank you for being so helpful and responsive.

RaivoKoot commented 2 years ago

Glad its fixed and no problem!!

RaivoKoot / Video-Dataset-Loading-Pytorch

Assigning values from data loader in training loop causing error #13

14