Add first keras callback.

riga commented 3 years ago

Based on the discussion in #3.

Things to do:

[x] Test of util.is_lazy_iterable
[x] Test of util.make_list
[x] Test of util.verbose_import
[x] Test of the actual callback in python 3 only.
[x] Test the callbacks on a machine with GPUs once (we need a custom builder for GH actions with a GPU soon)

Closes #3.

riga commented 3 years ago

Can you run the tests on a GPU as well, @pfackeldey ?

pfackeldey commented 3 years ago

Hi @riga ,

I just tested it and it works like a charm :) I have a 3 small comments:

I think the measured memory of the GPU device is in MiB units instead of MB (same as nvidia-smi), the difference is not significant though...
Maybe one want to make this clear in a __doc__-string: this callback logs the full usage of the GPUs, if you work with other people on the same machine and they also use GPUs the logging does not represent what your process/training alone is consuming.
Logging the usage of all GPUs might lead to "spam" or "unnecessary information". It is not so common to have cross-GPU-device trainings. It is more common to train on a machine with multiple GPUs, but just using a single GPU.

Fyi training loop logs look like this now 🎉 :

Epoch 1/60
1336/1336 - 5s - loss: 2749.4241 - resolution: 50.5707 - GPU 0 usage [%]: 13.8466 - GPU 0 vRAM [%]: 95.8586 - GPU 0 vRAM [MB]: 7782.9375 - val_loss: 2297.2878 - val_resolution: 47.0054 - lr: 0.0030
Epoch 2/60
1336/1336 - 4s - loss: 2315.0986 - resolution: 47.1101 - GPU 0 usage [%]: 14.6774 - GPU 0 vRAM [%]: 95.8586 - GPU 0 vRAM [MB]: 7782.9375 - val_loss: 2255.6926 - val_resolution: 46.6172 - lr: 0.0030
Epoch 3/60
1336/1336 - 4s - loss: 2280.3914 - resolution: 46.7554 - GPU 0 usage [%]: 14.9843 - GPU 0 vRAM [%]: 95.8586 - GPU 0 vRAM [MB]: 7782.9375 - val_loss: 2254.2732 - val_resolution: 46.5894 - lr: 0.0030

What do you think? Best, Peter

riga commented 3 years ago

👍 I made the stats configurable and I added the required docs.

cms-ml / cmsml

Add first keras callback. #4