Closed Shathe closed 5 years ago
Generally the dataset API is already highly optimized.
Yeah, you're right I am currently using my custom Loader for loading and performing data augmentation but... I think it's about time to move into the data.Dataset API haha I am gonna open another issue for that
Nevertheless... I just saw it andd... Do you know if you can use it when the data does not fit into RAM? I mean, if you cannot load the entire dataset at once... You can still use the Dataset API?
Yes of course
If you want to continue in Eager mode check also https://www.tensorflow.org/tutorials/eager/eager_basics#datasets
You know how? I mean, can you provide me a link or a function name or something? I'll try to search on internet and on the guide you linked in previous messages too
Almost all the real datasets don't fit in memory so you will find many many dataset API examples. Just to mention one https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/generative_examples/image_captioning_with_attention.ipynb
I mean, referring to the size of the dataset
That dataset doesn't fit in memory
Thanks a lot!
Add to the Loaders _getbatch function the possibility of loading it in another process.
If its the first time to call it, load a batch as normal and creates anew process to load next batch
it is not the first time to load it, creates a new process to load next batch and waits for the previous loaded batch in the queue.
https://stackoverflow.com/questions/2046603/is-it-possible-to-run-function-in-a-subprocess-without-threading-or-writing-a-se
compare both performances