Systematic way to handle batch/non-batch

guoquan commented 4 years ago

Most of the examples we are working on have no batch. However, most of PyTorch APIs consider the batch dimension. It might be critical to think about batch for both efficiency and compatibility.

kordjamshidi commented 4 years ago

What is the issue for adding batch?

kordjamshidi commented 4 years ago

Do we get reasonable accuracy for sentiment analysis and spam example? Those should be usually highly accurate using 1-gram an 2-grams. @hfaghihi15 can you try the train/test spam and see what is the result?

hfaghihi15 commented 4 years ago

Do we get reasonable accuracy for sentiment analysis and spam example? Those should be usually highly accurate using 1-gram an 2-grams. @hfaghihi15 can you try the train/test spam and see what is the result?

I can do that, but as we tested yesterday there is no metric or loss computation working. Is that fixed now?!

guoquan commented 4 years ago

Do we get reasonable accuracy for sentiment analysis and spam example? Those should be usually highly accurate using 1-gram an 2-grams.

I get a small portion of the data, and the model can overfit to it now. For the full training set, the current example is not using batch and the training is causing large oscillation and does not converge.

That is also why I raise this issue. I wanted to add batch in this example and see if it converge better.

kordjamshidi commented 4 years ago

didn't we have batches for conll data training?

kordjamshidi commented 4 years ago

What is the answer to the above question, does the metric or loss computation working? that @hfaghihi15 can try?

guoquan commented 4 years ago

What is the issue for adding batch?

Actually, the first issue is allowing non-batch since the current example is non-batch while torch.nn.CrossEntropy needs batch. I created a non-batch version of it to allow the current example run. But there are many other modules in torch people may want to use and need batch.

The issue of adding batch, is from the reader side. If the reader provides batch, everything after that is batch. (but datanode is per example again, maybe problematic still)

kordjamshidi commented 4 years ago

Can we parametrize the reader sensor to take batch size @hfaghihi15 ?

guoquan commented 4 years ago

didn't we have batches for conll data training?

That was an old interface that we don't want to use. And I did optimize the reader to provide batch samples (and many tricky things like masks).

hfaghihi15 commented 4 years ago

Can we parametrize the reader sensor to take batch size @hfaghihi15 ?

We can do it for the reader, but reading the data into data node and all should change as we will not have one root example any more per execution

kordjamshidi commented 4 years ago

Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?

guoquan commented 4 years ago

Looking at the design of Dataloader in PyTorch, it has a collate_fn() that handles how single data are organized into batches. The user may need to customize it to work with non tensor data.

kordjamshidi commented 4 years ago

not sure what this implies here for us?

guoquan commented 4 years ago

not sure what this implies here for us?

I mean that is their way to handle batch. Just for our reference.

kordjamshidi commented 4 years ago

not sure what this implies here for us?

I mean that is their way to handle batch. Just for our reference.

I had this question above: Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue? Also, how much time is needed to add this to reader and training. The reference is good but I am not sure what is the plan now? is that your next thing to add the batch?

kordjamshidi commented 4 years ago

Another related question: Is the latest conll implementation using the Datanode and also the tricks for using batch?

guoquan commented 4 years ago

Another related question: Is the latest conll implementation using the Datanode and also the tricks for using batch?

No, it is not using datanode. That s what I mean "old" interface.

guoquan commented 4 years ago

Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?

Not sure. Batching usually smooth out the curve of the loss and helps convergence. And performance (training speed) is also an issue. But the bottleneck will be inference if we add inference.

kordjamshidi commented 4 years ago

Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?

Not sure. Batching usually smooth out the curve of the loss and helps convergence. And performance (training speed) is also an issue. But the bottleneck will be inference if we add inference.

So, can we get a reasonable performance for tweet and spam examples before adding batch? why should we solve batch at the first place, that is my question? If that is quick, go ahead and address it but if that takes long is that a priority now? Again, can @hfaghihi15 try train and test the spam now? (given your changes)

guoquan commented 4 years ago

I get good results with 10 samples and the accuracy goes to 1.

However, when I try the whole dataset, accuracy is about 0.6.

It could be also the model is too much simplified.

guoquan commented 4 years ago

Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?

As I mentioned in the meeting, I had a larger model that achieves about 0.8 F1score without batch. Should I push the code?

HLR / DomiKnowS

Systematic way to handle batch/non-batch #142