Closed suyuan945 closed 2 years ago
This is most likely an Out of memory issue since the script tries to load all the training data to RAM, I think I was typically allocating around 70Gb.
You can bypass it by loading one sample at a time instead of loading everything at the start, in a fashion more similar to traditional DL pipelines on large datasets.
Here is some code to replace the training, eval, and loading scripts. Not thoroughly tested https://gist.github.com/gafniguy/5a66f471ad227d022aed96944432adea
Thank you for your reply. I'll try it.
I found the data loading costs ~1 hr for my Linux machine. The available RAM is 80GB. I tried to reduce the number of images of the training and test set. Unfortunately, it still happens. Any ideas?
did you try to use the torch dataset class I attached here? Instead of hogging everything on the RAM, it loads the images once at a time. It should be quite quick to start (just reads the jsons and not the images)
I found the data loading costs ~1 hr for my Linux machine. The available RAM is 80GB. I tried to reduce the number of images of the training and test set. Unfortunately, it still happens. Any ideas?
Will try it later. Thank you.
This error occurred when I used the dataset you provided. I want to know what the hardware Env of the code are? my pc: GPU:2080ti 16G CPU memory:32G