Open szha opened 6 years ago
Goal: have a generic interface for all seq2seq generation tasks.
Two main problems as I see in the current scripts:
Ideal user experience: users only need specify the data, model, loss; the rest should be "invisible" to users, e.g. batch sampler, distributed training, optimization etc.
Refactor suggestion:
Task interface
Model I/O
Evaluate / generate
@sxjscience
Strongly agree with @hhexiy that: "users only need specify the data, model, loss; the rest should be "invisible" to users, e.g. batch sampler, distributed training, optimization etc." There are common pieces of code which end up repeatedly written in every gluon notebook: Splitting data for data-parallel training, computation of metrics, forward-backward pass, etc. In the symbolic interface, we did a better job of capturing these common patterns, letting users avoid writing (or rather, copying & pasting) the same code again and again. I hope gluon also sets up a "single best way" of performing these common tasks by providing APIs for them. We should consider the repetitions of the same code block as "bad smell", and try to support them as a part of gluon or gluon-nlp API. Especially when they are done in an inconsistent way. For example, take how three notebooks are doing the gradient clipping in a different way: http://gluon-nlp.mxnet.io/examples/machine_translation/gnmt.html http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html http://gluon-nlp.mxnet.io/examples/sentence_embedding/self_attentive_sentence_embedding.html
There should be a minimal overlap of code between training and evaluation loop. For example, in http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html , train()
and evaluate()
contain redundant pieces of code for the forward pass and metric computation. Now http://gluon-nlp.mxnet.io/examples/sentence_embedding/self_attentive_sentence_embedding.html does a better job of code sharing between the training and evaluation loop, although this line batch_pred, l = calculate_loss(batch_x, batch_y, model, loss, class_weight, penal_coeff)
could be shared, by only scoping into autograd.record()
with:
with ExitStack() as stack:
if is_train:
stack.enter_context(autograd.record())
I love and echo the suggestions from @hhexiy and @bikestra - specially around separating the data loading frills from the model building. Specially, now with the various data loading utilities that gluonnlp adds like the bucketing samplers, etc. this should be easy to separate out.
Some other additional suggestions:
Multi-gpu Usage (Probably more generally about gluon than gluonnlp?) In symbolic mxnet we had a very transparent way of dealing with multi-gpu training. Now the examples in gluonnlp have been changing on recommended ways of this and they look more involved too. Adding a recommendation would be nice and adding an abstraction would be even better!
Visualization Support A recommended way/example to quickly plug in tensorboard style training visualizations (also t-sne for word embeddings) as well as visualizations for attention might be a good to have. While this is trivial to re-invent some recommended way or utility might make it easier for newcomers since most toolkits support it natively too.
Interactive Model Support @hhexiy mentioned it above already, but elaborating on my thoughts for it. This is extremely useful to be able to build a quick qualitative understanding of the model once ready to replicate inference time paths. So any utilities around that would be useful. Also, it would in turn encourage us to ensure we don't have to write a lot of new code to support it if we have done a good job of writing the train and test pieces in synergy.
One more:
def forward(self,input):
# bilstm_outputs ----> :( max_length, bs, 2*lstm_....)
bilstm_output, hdn = self.bilstm(embeddings.swapaxes(0,1), bidirectional=True)
# bilstm_mean ----> :(batch_size, max_length)
bilstm_mean = bilstm_output.mean(axis=0)
All scripts instead of just replicating the results in the paper, can also give a provision to run the algorithms for an customer inputted data, inline with same concern from @hhexiy @bikestra and @rishita above! This enables very easy onboarding.
Add tensorboard support for each of the scripts as a plug-in inline with other features like data loading, data transformers etc.
There have been feedbacks about the way the scripts are currently written that can be improved, so I'd like to start a thread to collect feedback and suggestions on how to improve, and potentially come up with a template/requirement for future scripts. When commenting, please provide motivation and explanation details, and quote the part of the code to improve, or the code style to follow if available.