Pipe-Runner-Lab / CVPR2020-FGVC7

CVPR2020-FGVC7 Kaggle submission
MIT License
0 stars 0 forks source link

Add efficient Data loader for image in pipeline #5

Open Pipe-Runner opened 4 years ago

Pipe-Runner commented 4 years ago

Load image paths and labels into DataFrame The dataset images were loaded into NumPy arrays, and saved as .npy files. This allows them to be loaded up really fast, preventing the data loading process from bottlenecking the GPU training.

When loading the images from JPEGs, the CPU is at max usage and the GPU usage dips occasionally between batches.

When loading the images from .npy files, the CPU usage is less than max and the GPU usage is more consistent. So it helps :)

Pipe-Runner commented 4 years ago

https://www.kaggle.com/akasharidas/plant-pathology-2020-in-pytorch-0-971-score#kln-17