About the transform in dataset

Lyken17 / Efficient-PyTorch

My best practice of training large dataset using PyTorch.

1.08k stars 139 forks source link

About the transform in dataset #6

Closed Solacex closed 5 years ago

Solacex commented 5 years ago

Hello!

Your code for accelerating training is really helpful! Thank you! In most cases, we only need several transformation for data augmentation such as flip, multi-crop, and I noticed that the code released in Non-local(https://github.com/facebookresearch/video-nonlocal-net/tree/master/process_data/kinetics) store the transformed data into lmdb file, will this accelerate the training the your current code? or have you compared the two method? If you have some experiment on that, could you share with us ?

Lyken17 commented 5 years ago

Happy to see my code helps your project. Sure, you can store the "transformed data" directly into lmdb, but be aware that this will require significantly more disk space. Facebook's non-local chooses this because it is slow to index a random frame in the video. If you are doing traditional vision task like classification /segmentation / detection, I don't think the CPU will be a bottleneck.

Solacex commented 5 years ago

Happy to see my code helps your project. Sure, you can store the "transformed data" directly into lmdb, but be aware that this will require significantly more disk space. Facebook's non-local chooses this because it is slow to index a random frame in the video. If you are doing traditional vision task like classification /segmentation / detection, I don't think the CPU will be a bottleneck.

Thank you! Have a nice day~