szha commented 6 years ago

There have been feedbacks about the way the scripts are currently written that can be improved, so I'd like to start a thread to collect feedback and suggestions on how to improve, and potentially come up with a template/requirement for future scripts. When commenting, please provide motivation and explanation details, and quote the part of the code to improve, or the code style to follow if available.

hhexiy commented 6 years ago

Goal: have a generic interface for all seq2seq generation tasks.

Two main problems as I see in the current scripts:

training and testing / generation are coupled
has to copy / paste scripts for different seq2seq models

Ideal user experience: users only need specify the data, model, loss; the rest should be "invisible" to users, e.g. batch sampler, distributed training, optimization etc.

Refactor suggestion:

Task interface

specify task specific things required for training
build model, load data, construct loss function

Model I/O

save / load params, vocab, config in a single call to ensure training / test match

Evaluate / generate

output and print n-best result from beam search, including source, target, score
support interactive model
report perplexity and user defined metric

hhexiy commented 6 years ago

@sxjscience

bikestra commented 6 years ago

Strongly agree with @hhexiy that: "users only need specify the data, model, loss; the rest should be "invisible" to users, e.g. batch sampler, distributed training, optimization etc." There are common pieces of code which end up repeatedly written in every gluon notebook: Splitting data for data-parallel training, computation of metrics, forward-backward pass, etc. In the symbolic interface, we did a better job of capturing these common patterns, letting users avoid writing (or rather, copying & pasting) the same code again and again. I hope gluon also sets up a "single best way" of performing these common tasks by providing APIs for them. We should consider the repetitions of the same code block as "bad smell", and try to support them as a part of gluon or gluon-nlp API. Especially when they are done in an inconsistent way. For example, take how three notebooks are doing the gradient clipping in a different way: http://gluon-nlp.mxnet.io/examples/machine_translation/gnmt.html http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html http://gluon-nlp.mxnet.io/examples/sentence_embedding/self_attentive_sentence_embedding.html

There should be a minimal overlap of code between training and evaluation loop. For example, in http://gluon-nlp.mxnet.io/examples/sentiment_analysis/sentiment_analysis.html , train() and evaluate() contain redundant pieces of code for the forward pass and metric computation. Now http://gluon-nlp.mxnet.io/examples/sentence_embedding/self_attentive_sentence_embedding.html does a better job of code sharing between the training and evaluation loop, although this line batch_pred, l = calculate_loss(batch_x, batch_y, model, loss, class_weight, penal_coeff) could be shared, by only scoping into autograd.record() with:

            with ExitStack() as stack:
                if is_train:
                    stack.enter_context(autograd.record())

rishita commented 6 years ago

I love and echo the suggestions from @hhexiy and @bikestra - specially around separating the data loading frills from the model building. Specially, now with the various data loading utilities that gluonnlp adds like the bucketing samplers, etc. this should be easy to separate out.

Some other additional suggestions:

Multi-gpu Usage (Probably more generally about gluon than gluonnlp?) In symbolic mxnet we had a very transparent way of dealing with multi-gpu training. Now the examples in gluonnlp have been changing on recommended ways of this and they look more involved too. Adding a recommendation would be nice and adding an abstraction would be even better!
Visualization Support A recommended way/example to quickly plug in tensorboard style training visualizations (also t-sne for word embeddings) as well as visualizations for attention might be a good to have. While this is trivial to re-invent some recommended way or utility might make it easier for newcomers since most toolkits support it natively too.
Interactive Model Support @hhexiy mentioned it above already, but elaborating on my thoughts for it. This is extremely useful to be able to build a quick qualitative understanding of the model once ready to replicate inference time paths. So any utilities around that would be useful. Also, it would in turn encourage us to ensure we don't have to write a lot of new code to support it if we have done a good job of writing the train and test pieces in synergy.

rishita commented 6 years ago

One more:

Make hybridization more friendly with bucketing For people migrating from mxnet symbolic, having a standard way of migrating bucketed graphs to hybridized layers will be good.

sravanbabuiitm commented 5 years ago

For all the scripts that are going to be added to gluonnlp, we can mention shape of tensors on top of the hybrid block as a standard practice.

def forward(self,input):

Input ---> (T,N,C ) : (max_length, batch_size, embedding_dim)

    # bilstm_outputs ----> :( max_length, bs, 2*lstm_....)
    bilstm_output, hdn = self.bilstm(embeddings.swapaxes(0,1), bidirectional=True)

    # bilstm_mean ----> :(batch_size, max_length)
    bilstm_mean = bilstm_output.mean(axis=0)

All scripts instead of just replicating the results in the paper, can also give a provision to run the algorithms for an customer inputted data, inline with same concern from @hhexiy @bikestra and @rishita above! This enables very easy onboarding.
Add tensorboard support for each of the scripts as a plug-in inline with other features like data loading, data transformers etc.

dmlc / gluon-nlp

Scripts improvements #382

Input ---> (T,N,C ) : (max_length, batch_size, embedding_dim)