[circle+] Implement circle+(training) parameter

zetwhite commented 11 months ago

Let's implement a training param importer that gets hyper-parameters from the circle+ file.

(notes) For now, onert_train gets hyper-parameters from command line options.

zetwhite commented 10 months ago

(Progress Updates)

Now, the draft (https://github.com/Samsung/ONE/pull/11740) works well with this file.

model_file : mnist_with_meta_circle.zip
- epoch : 2, batch : 64, optimizer : sgd, learning_rate : 0.01, loss_fn : MSE
- have only FC and Relu operation

➜ ~/Workspace/ONE/Product/out/bin/onert_train --modelfile mnist_with_meta.circle --load_input:raw pool_mnist_model/mnist_x_train.1000.bin --load_expected:raw pool_mnist_model/mnist_y_train.1000.bin --data_length 1000 -v 100                                

Model Expected Filename pool_mnist_model/mnist_y_train.1000.bin
Model Input Filename pool_mnist_model/mnist_x_train.1000.bin
Model Filename mnist_with_meta.circle
Epoch 1/2 - 50.340ms/step - loss: [0] 0.2309
Epoch 2/2 - 50.130ms/step - loss: [0] 0.2185
===================================
MODEL_LOAD   takes 1.486 ms
PREPARE      takes 8.617 ms
EXECUTE      takes 1508.700 ms
- MEAN     :  1508.700 ms
- MAX      :  1508.700 ms
- MIN      :  1508.700 ms
- GEOMEAN  :  1508.700 ms

For next step,

I will clean up some code from the draft
- resolve some uncertain part while writing the draft
I'm also thinking of compatibility with the command line option.
- Should we remove command line opt like (--epochs, --learing_rate, --loss, --optimizer)?
- If both cmd-line-opt and circlefile-metadata and cmd-opt are given, which one should be applied?
  - I personally think the way - first load parameter from a model file, and if cmd-line-opt are given, override the parameter.

zetwhite commented 10 months ago

(additionally) In offline, I asked @jyoungyun for advice on the current implementation details(in draft https://github.com/Samsung/ONE/pull/11740). Since she gave me a much better idea, I have a plan to fix a current implementation.

zetwhite commented 9 months ago

(Progress Updated)

I cleaned the code from previous draft and upload it as version-2 ( https://github.com/Samsung/ONE/pull/12045 ). You can check it is working well with the file : mnist.zip

➜ ~/Workspace/ONE/Product/out/bin/onert_train --modelfile mnist.circle+ --load_input:raw pool_mnist_model/mnist_x_train.1000.bin --load_expected:raw pool_mnist_model/mnist_y_train.1000.bin --data_length 1000 -v 100
Model Expected Filename pool_mnist_model/mnist_y_train.1000.bin
Model Input Filename pool_mnist_model/mnist_x_train.1000.bin
Model Filename mnist.circle+
== Training Paramter (from model file) ============
batch size    : 64
learning rate : 0.001
loss func     : 0(mean_squared_error)
optimizer     : 0(sgd)
================================================
Epoch 1/5 - time: 38.330ms/step - loss: [0] 0.2106
Epoch 2/5 - time: 38.244ms/step - loss: [0] 0.1995
Epoch 3/5 - time: 38.249ms/step - loss: [0] 0.1900
Epoch 4/5 - time: 38.191ms/step - loss: [0] 0.1818
Epoch 5/5 - time: 38.311ms/step - loss: [0] 0.1747
===================================
MODEL_LOAD   takes 1.4870 ms
PREPARE      takes 10.8400 ms
EXECUTE      takes 2873.9660 ms
- Epoch 1      takes 574.9520 ms
- Epoch 2      takes 573.6640 ms
- Epoch 3      takes 573.7380 ms
- Epoch 4      takes 572.8630 ms
- Epoch 5      takes 574.6670 ms
===================================

zetwhite commented 8 months ago

(Progress Updated)

Draft v3 is ready - https://github.com/Samsung/ONE/pull/12152. I made a note about API usage example here - https://github.com/Samsung/ONE/pull/12152#issuecomment-1833214304.

For now, I'm quite satisfied current implementation. Because it touches many parts, It is a bit hard to get a detailed review. I'd better make a small-size PR and get a detailed review on each one.

PR plan

[x] (core/ir)
- update ir/Model and base_loader to aware metadata
  - https://github.com/Samsung/ONE/pull/12294
- introduce ir/TrainingInfo
  - https://github.com/Samsung/ONE/pull/12340
  - https://github.com/Samsung/ONE/pull/12369
- rename invalid to undef
  - undef is more appropriate for the default value of ir
  - https://github.com/Samsung/ONE/pull/12393
[x] (frontend)
- add TrainingInfo loader
  - https://github.com/Samsung/ONE/pull/12400
  - https://github.com/Samsung/ONE/pull/12410
[x] (api)
- [session] add TrainingInfo in the session
  - https://github.com/Samsung/ONE/pull/12342
- [load] update nnfw_loadmodel... to load train_info
  - https://github.com/Samsung/ONE/pull/12408
- [setter] add nnfw_train_set_traininfo
  - https://github.com/Samsung/ONE/pull/12384
  - https://github.com/Samsung/ONE/pull/12383
- [getter] add nnfw_train_get_traininfo, nnfw_train_get_batch_size
  - https://github.com/Samsung/ONE/pull/12391
[ ] (tests)
- sub-issue : https://github.com/Samsung/ONE/issues/12520
- update onert_train to use newly added API
  - https://github.com/Samsung/ONE/pull/12411
- more : https://github.com/Samsung/ONE/pull/12296

zetwhite commented 8 months ago

Discussion Point

This comment is for logging :smile:

Background

Model parameters (we usually call TrainInfo) can be given 2 ways.

given by circle model file
given by argument of nnfw_train_preapre(nnfw_train_info* tinfo)

Have to decide

If a model parameter is not given in both ways, How should we handle?

[1] throw an error when nnfw_train_prepare called, with a message - 'train_parameter is not set'
- pros : easy to manage and implement
- cons : ...
[2] make it run with very default (e.g batch_size =1, loss=sgd, learning_rate = 0.01 ...etc)
- pros
  - new users can use the training feature without any setting or knowledge.
- cons
  - setting the default option is tricky - Is there a very default option that can applied to any kind of model?
  - confusion situations can occur (e.g. some parameters are set by default, some parameters are set by model, and some are set by user api. )

Conclusion

I asked the others opinion(@jyoungyun , @chunseoklee , @hseok-oh , @ragmani ) on offline. In offline discussion, we prefer "[1] throw an error" over "[2] provide default parameter".

zetwhite commented 7 months ago

Additional things to change

The list of unknown issues - what I've known through code review. Let's handle it one by one :)

[ ] Remove redundant copy of TrainingInfo https://github.com/Samsung/ONE/pull/12383#issuecomment-1871730802
[ ] Make sure that mmaped data release https://github.com/Samsung/ONE/pull/12433#issuecomment-1884249714

Samsung / ONE