Neuroglycerin / neukrill-net-tools

Tools coded as part of the NDSB competition.
MIT License
0 stars 0 forks source link

CPU usage in epochs #132

Closed gngdb closed 9 years ago

gngdb commented 9 years ago

The big problem with using online augmentation is that we have a lot of CPU usage during the epoch as it generates the next minibatch every time one it requested.

Improved this by using the multiprocessing library to generate the next minibatch asynchronously and then getting it when required. However, this only affects ListDataset.

Notes made when developing this change can be found in the notebook Iterators with Multiprocessing.

fmaguire commented 9 years ago

Could run everything from a ramdisk too. Loading all the images into one would speed things up a lot.

On Wed, 11 Mar 2015 00:08 Gavin Gray notifications@github.com wrote:

Closed #132 https://github.com/Neuroglycerin/neukrill-net-tools/issues/132.

— Reply to this email directly or view it on GitHub https://github.com/Neuroglycerin/neukrill-net-tools/issues/132#event-249771035 .

gngdb commented 9 years ago

The images are loaded into RAM as a list of numpy arrays at initialisation; isn't that as good?

On 11 March 2015 at 00:10, Finlay Maguire notifications@github.com wrote:

Could run everything from a ramdisk too. Loading all the images into one would speed things up a lot.

On Wed, 11 Mar 2015 00:08 Gavin Gray notifications@github.com wrote:

Closed #132 https://github.com/Neuroglycerin/neukrill-net-tools/issues/132.

— Reply to this email directly or view it on GitHub < https://github.com/Neuroglycerin/neukrill-net-tools/issues/132#event-249771035

.

— Reply to this email directly or view it on GitHub https://github.com/Neuroglycerin/neukrill-net-tools/issues/132#issuecomment-78176153 .

gngdb commented 9 years ago

Pre-computing entire epochs is next step in performance improvements probably. But, we're seeing good numbers for GPU usage and epoch speeds now, so may not be necessary.

On 11 March 2015 at 00:16, Gavin Gray gavingray1729@gmail.com wrote:

The images are loaded into RAM as a list of numpy arrays at initialisation; isn't that as good?

On 11 March 2015 at 00:10, Finlay Maguire notifications@github.com wrote:

Could run everything from a ramdisk too. Loading all the images into one would speed things up a lot.

On Wed, 11 Mar 2015 00:08 Gavin Gray notifications@github.com wrote:

Closed #132 https://github.com/Neuroglycerin/neukrill-net-tools/issues/132.

— Reply to this email directly or view it on GitHub < https://github.com/Neuroglycerin/neukrill-net-tools/issues/132#event-249771035

.

— Reply to this email directly or view it on GitHub https://github.com/Neuroglycerin/neukrill-net-tools/issues/132#issuecomment-78176153 .

gngdb commented 9 years ago

Hit problems with this running test models last night. Was hitting TimeOut errors and another error that NaN's were occuring in the network (could have beeen dependent on the network). Fixed by increasing the TimeOut time. Also, moved the pool into the dataset, so it doesn't have to make a new one every epoch. Should make the code a bit cleaner too.

gngdb commented 9 years ago

Fixed this for all datasets using online augmentation now.