Lyken17 / Efficient-PyTorch

My best practice of training large dataset using PyTorch.
1.08k stars 139 forks source link

Speed is a bit slower after using lmdb. #4

Closed Fangyh09 closed 5 years ago

Fangyh09 commented 5 years ago

Speed is a bit slower after using lmdb. 30k images with size 1000x1000. The images are stored in SSD. Are there some locks in lmdb slow the speed?

Lyken17 commented 5 years ago

The resolution is much larger than ImageNet, where I benchmarked my code. I think perhaps in your case the challenging part is the CPU.

My loader disables write and turns off the lock https://github.com/Lyken17/Efficient-PyTorch/blob/master/tools/folder2lmdb.py#L27. I don't think the LMDB library will slow the process.

Can you share you disk I/O, and your CPU utilization during loading?

Fangyh09 commented 5 years ago

The server is used by multiple users and I have to wait for a good time. I wonder whether the postprocess will slow the speed or not.

img = Image.open(buf).convert('RGB')
Lyken17 commented 5 years ago

Usually not for imagenet data. But for your case, I think it might be. Have you tried this repo https://github.com/uploadcare/pillow-simd?

Lyken17 commented 5 years ago

Close for no activity for a week. Feel free to reopen it if it is necessary.