geoffwoollard / ece1512_project

0 stars 0 forks source link

train on more data #2

Closed geoffwoollard closed 5 years ago

geoffwoollard commented 5 years ago

I see two ways to avoid swamping the memory

(1) Have an explicit loop that fits mini batches. Each time a batch is read off disk.

(2) Create some sort of file handle object for keras that does not explicitly read the data, but only allows the data to be read in chunks. Search keras for an API.

Note that it is possible to convert themrc files to numpy arrays and write them to disk in a big chunk to speed things up.

geoffwoollard commented 5 years ago

See https://github.com/keras-team/keras/issues/107

geoffwoollard commented 5 years ago

Promising links https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly https://medium.com/@ensembledme/writing-custom-keras-generators-fe815d992c5a https://keras.io/models/sequential/

Davjes15 commented 5 years ago

Data Generator

https://medium.com/datadriveninvestor/keras-training-on-large-datasets-3e9d9dbc09d4

Davjes15 commented 5 years ago

Keras Data Generator

https://stackoverflow.com/questions/47200146/keras-load-images-batch-wise-for-large-dataset

geoffwoollard commented 5 years ago

See https://github.com/geoffwoollard/ece1512_project/commit/a9144d81639fb64b79913aedfed7b678c9e01c9d