Open guoquan opened 4 years ago
What is the issue for adding batch?
Do we get reasonable accuracy for sentiment analysis and spam example? Those should be usually highly accurate using 1-gram an 2-grams. @hfaghihi15 can you try the train/test spam and see what is the result?
Do we get reasonable accuracy for sentiment analysis and spam example? Those should be usually highly accurate using 1-gram an 2-grams. @hfaghihi15 can you try the train/test spam and see what is the result?
I can do that, but as we tested yesterday there is no metric or loss computation working. Is that fixed now?!
Do we get reasonable accuracy for sentiment analysis and spam example? Those should be usually highly accurate using 1-gram an 2-grams.
I get a small portion of the data, and the model can overfit to it now. For the full training set, the current example is not using batch and the training is causing large oscillation and does not converge.
That is also why I raise this issue. I wanted to add batch in this example and see if it converge better.
didn't we have batches for conll data training?
What is the answer to the above question, does the metric or loss computation working? that @hfaghihi15 can try?
What is the issue for adding batch?
Actually, the first issue is allowing non-batch since the current example is non-batch while torch.nn.CrossEntropy
needs batch.
I created a non-batch version of it to allow the current example run. But there are many other modules in torch people may want to use and need batch.
The issue of adding batch, is from the reader side. If the reader provides batch, everything after that is batch. (but datanode is per example again, maybe problematic still)
Can we parametrize the reader sensor to take batch size @hfaghihi15 ?
didn't we have batches for conll data training?
That was an old interface that we don't want to use. And I did optimize the reader to provide batch samples (and many tricky things like masks).
Can we parametrize the reader sensor to take batch size @hfaghihi15 ?
We can do it for the reader, but reading the data into data node and all should change as we will not have one root example any more per execution
Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?
Looking at the design of Dataloader
in PyTorch, it has a collate_fn()
that handles how single data are organized into batches. The user may need to customize it to work with non tensor data.
not sure what this implies here for us?
not sure what this implies here for us?
I mean that is their way to handle batch. Just for our reference.
not sure what this implies here for us?
I mean that is their way to handle batch. Just for our reference.
I had this question above: Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue? Also, how much time is needed to add this to reader and training. The reference is good but I am not sure what is the plan now? is that your next thing to add the batch?
Another related question: Is the latest conll implementation using the Datanode and also the tricks for using batch?
Another related question: Is the latest conll implementation using the Datanode and also the tricks for using batch?
No, it is not using datanode. That s what I mean "old" interface.
Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?
Not sure. Batching usually smooth out the curve of the loss and helps convergence. And performance (training speed) is also an issue. But the bottleneck will be inference if we add inference.
Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?
Not sure. Batching usually smooth out the curve of the loss and helps convergence. And performance (training speed) is also an issue. But the bottleneck will be inference if we add inference.
So, can we get a reasonable performance for tweet and spam examples before adding batch? why should we solve batch at the first place, that is my question? If that is quick, go ahead and address it but if that takes long is that a priority now? Again, can @hfaghihi15 try train and test the spam now? (given your changes)
I get good results with 10 samples and the accuracy goes to 1.
However, when I try the whole dataset, accuracy is about 0.6.
It could be also the model is too much simplified.
Is the batch size so critical for convergence on Spam and tweeter example? or is it a performance issue?
As I mentioned in the meeting, I had a larger model that achieves about 0.8 F1score without batch. Should I push the code?
Most of the examples we are working on have no batch. However, most of PyTorch APIs consider the batch dimension. It might be critical to think about batch for both efficiency and compatibility.