Lyken17 / Efficient-PyTorch

My best practice of training large dataset using PyTorch.
1.08k stars 139 forks source link

Final lmdb file for ImageNet? #11

Closed he-y closed 4 years ago

he-y commented 5 years ago

Great work. Could you provide the final lmdb file for ImageNet?

Lyken17 commented 5 years ago

It might be hard. No cloud drive provides a fast yet stable download for ~100Gb file.

If you have original imagenet available locally, you can generate the lmdb via

python folder2lmdb.py --folder $IMAGENET_ROOT --name train
he-y commented 5 years ago

Thanks. I have got the lmdb file from the above commander.

But during training, I meet the msgpack error. After solving the problem with the reply.

I meet another error:

File "pruning_imagenet_lmdb.py", line 741, in main() File "pruning_imagenet_lmdb.py", line 177, in main num_workers=args.workers, pin_memory=True, sampler=None) File "/data/yahe/anaconda3/envs/pt1.2/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 213, in init sampler = RandomSampler(dataset) File "/data/yahe/anaconda3/envs/pt1.2/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 92, in init if not isinstance(self.num_samples, int) or self.num_samples <= 0: File "/data/yahe/anaconda3/envs/pt1.2/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 100, in num_samples return len(self.data_source) TypeError: 'bytes' object cannot be interpreted as an integer

I'm not sure if the LMDB is correct. Do you have any idea about this? System: Pytorch =1.2 torchvision=0.4 image

Lyken17 commented 5 years ago

That's weird. Not sure what the exact cause is. Can you try tensorpack's way to dump lmdb?

he-y commented 5 years ago

Thanks. Do the raw images be processed by the facebook process? Or just directly extracted from source tar file?

Lyken17 commented 5 years ago

Original source images are dumped into the lmdb, in order to be more flexible for difference preprocesses.

Lyken17 commented 5 years ago

Does tensorpack's script solve your problem?

he-y commented 5 years ago

Facebook process create a list of directory and move the image of the same class into one folder, which might be the reason of error. I will try original image without process later.

Lyken17 commented 5 years ago

My script is based on torchvision.datasets.ImageFolder, which requires validation to be pre-processed by fb scripts.

Fangyh09 commented 4 years ago

@he-y You can try this https://github.com/Fangyh09/Image2LMDB which is based on this project and fix some small problems.

MorningstarZhang commented 4 years ago

Cool! This really works but I`m not sure if the error occured is normal when I set num_workers>0 ? I can only run that with num_workers=0

Lyken17 commented 4 years ago

@MorningstarZhang Can you share the detailed error log?

@he-y @Fangyh09 I have uploaded final lmdbs on academic torrents. You are welcome to try and share feedback.

Training lmdb Val lmdb

MorningstarZhang commented 4 years ago

@Lyken17 Thanks. My problem has been tackled by initializing ImageFolderLMDB.env in function getitem() instead of function init() as the code used to be. I`m not sure if it was caused by some modification that I made. It seems that no one else met the same trouble. Here is the error log if it would be helpful.

C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\python.exe D:/Users/Administrator/PycharmProjects/Image2LMDB-master/test_lmdb.py Traceback (most recent call last): File "D:/Users/Administrator/PycharmProjects/Image2LMDB-master/test_lmdb.py", line 59, in for index, data in enumerate(trainloader): File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\site-packages\torch\utils\data\dataloader.py", line 278, in iter return _MultiProcessingDataLoaderIter(self) File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\site-packages\torch\utils\data\dataloader.py", line 682, in init w.start() File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\multiprocessing\popen_spawn_win32.py", line 89, in init reduction.dump(process_obj, to_child) File "C:\Users\Administrator\AppData\Local\conda\conda\envs\my_root\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: can't pickle Environment objects

Lyken17 commented 4 years ago

@MorningstarZhang Seems your problem is related with _MultiProcessingDataLoaderIter. This is a known issue for windows platform. Can you try set num_workers to 0 ?

MorningstarZhang commented 4 years ago

@Lyken17 Nothing goes wrong if num_workers is set to 0.

Lyken17 commented 4 years ago

@MorningstarZhang Then the issue should be related with shared memory on multi-process. I am afraid that I cannot help because the issue is on Python/windows side.

MorningstarZhang commented 4 years ago

@Lyken17 That`s OK. I have made some changes to the folder2lmdb.py so that it can run without error even num_worker is set to more than 0 and at least it can meet my need now.

Lyken17 commented 4 years ago

Close this issue since the problem was solved. Feel free to re-open if necessary.