Open vr25 opened 4 years ago
On Fri, Oct 2, 2020, 11:12 AM Vipula Rawte notifications@github.com wrote:
Hi,
When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?
Also, do you unfreeze and fine-tune for the classification task?
Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AndriyMulyar/bert_document_classification/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJ4TBT57F53RQMV4ACPIJTSIXUWRANCNFSM4SB2AJKA .
More explanation on how loss is calculated for every chunk separately? I mean the entire document has a target label and so AFAIU, the loss would be calculated for this target, right? Please let me know if I am missing something.
Also, what is the maximum number of chunks in the entire dataset?
The default config has bert_batch_size=7 but I have some documents with a total number of chunks=125 per document. In such cases, if I set bert_batch_size to 125, I run into CUDA OOM error.
Any suggestions for this?
Thanks!
Hi,
When the document chunks are fed to the data parallel model, how is the loss backpropagated? Is it for every chunk?
Also, do you unfreeze and fine-tune for the classification task?
Thank you!