Increase dataset pipeline performance

DAT-05-02 / P5

3 stars 1 forks source link

Increase dataset pipeline performance #91

Open khmikkelsen opened 11 months ago

khmikkelsen commented 11 months ago

inspiration: https://www.tensorflow.org/guide/data_performance

When using batches, the input shape will have an additional dimension of size N depending on number of batches. Perhaps we can save batches on disk, then drop the already used, and fetch the next batch on next epoch.

Example GPU mem use of batches(1) in current model on #85 : 3.8GB When using batches(32): 7.5GB

lucas2000k commented 10 months ago

mostly focusses on time efficiency and not a lot on memory efficiency, so not very useful for current model + recommends to use map functions for the data transformation, but we do the transformations before loading model

khmikkelsen commented 10 months ago

Yes, the point is we need to convert to do transformations 'lazily', instead of before loading model.