NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

Unable to save model every n epochs #432

Closed pantherso48 closed 6 years ago

pantherso48 commented 6 years ago

I am looking to run my model and save the epoch every n epochs, want to start with 10 and then see what my plots look like after.

If I run this function: mlp.fit(train_set, optimizer=optimizer, num_epochs=epochs, cost=cost, callbacks=callbacks)

How can you save the model outcome per n number of epochs?

Thanks!

baojun-nervana commented 6 years ago

@pantherso48 Did you try "--serialize N"?

armando-fandango commented 6 years ago

@baojun-nervana we are trying to use neon on the local machine. Does this option work without the ncloud? Also, does it save accuracy and loss after every epoch? If not what is the best way to get accuracy vs. epoch graph after running for let us say 500 iterations.

pantherso48 commented 6 years ago

Update:

I have tried to get the Accuracy of each epoch's run in two ways, one by adding the metric=Accuracy() argument to the Callbacks instantiation. In the source code this argument causes a callback to be added. But I am not seeing any output in my logs. I am using jupyter notebooks, would running in terminal help?

The other way I tried was using the add_callback() function in Callbacks class which did not show any output as well. The code examples are below: 1) callbacks = Callbacks(mlp, eval_set=train_set, metric=Accuracy()) 2) callbacks.add_callback(MetricCallback(eval_set=train_set, metric=Accuracy(), epoch_freq=1))

My next option is to try to write a custom callback and print out the Accuracy that way, any direction would be greatly appreciated, thank you!

baojun-nervana commented 6 years ago

@armando-fandango @pantherso48 I believe the "--serialize N" argument will do the same for local and cloud system.

The following example has log on accuracy. Hope it can help. https://github.com/NervanaSystems/neon/blob/master/examples/babi/train.py

armando-fandango commented 6 years ago

@baojun-nervana This example logs accuracy at the end. We want to log the accuracy and cost after every epoch of training, within the fit function.

baojun-nervana commented 6 years ago

@armando-fandango I don't think there is a convenient way to save accuracy and loss info the model parameters. The model usually has output on lost and accuracy. Can you extract those info from the output?

hanlint commented 6 years ago

Hello @armando-fandango , we have several options. If you are running in terminal, we have several command line options to both serialize the model parameters, but also generate the cost/loss/accuracy telemetry data.

python mnist_mlp.py --output_file data.hdf5 --save_path mymodel.prm --history 10

When you run the above, the accuracy/cost after every epoch of training will be logged in an HDF format dataset for your subsequent use. The model will also be saved as mymodel.prm, while keeping the last 10 epochs of trained models.

Mechanistically, those callbacks are triggered in this line:

callbacks = Callbacks(mlp, eval_set=valid_set, **args.callback_args)

Where the relevant command line arguments are passed in. If you running an ipython notebook, you can achieve the same effect by passing in the arguments directly. For example:

callbacks = Callbacks(mlp, eval_set=valid_set, output_file='data.hdf5', save_path='mymodel.prm', history=10, metric='Accuracy')

For a full list of accepted callback arguments, please see here.

hanlint commented 6 years ago

Additionally, if you are running in ipython, and want to see the log output, you will have to increase the logger level via:

import logging
main_logger = logging.getLogger('neon')
main_logger.setLevel(10)

For an example of this, see this notebook:

We also have some experimental routines for directly plotting the loss during training. See this notebook.

pantherso48 commented 6 years ago

Created a custom callback function and retrieved the accuracy and cost by using the callback dictionary, thanks for the help!