Closed bhavikm closed 4 years ago
Hello @bhavikm thanks for asking this question. Reproject words if set to True
adds a fully connected layer after embedding the words and before input representations into the document RNN. I.e. without reprojection the sequence is embed words -> document RNN and with embed words -> linear map -> document RNN. We do this for the reason illustated in #690.
As for recommended usage I would probably reproject by default but try out both options to make sure. We are still experimenting a lot with different parameters - I'll share our experience once we have good recommendations. We'll also appreciate it if you and others share your experience if / whether reprojection makes sense for your use cases!
Thanks @alanakbik , I'll try it out and report back my results.
@bhavikm did you find a significant difference between the two options?
@alanakbik I found using the reprojection leads to performing worse than simply fine-tuning BERT for text classification. I think it makes sense because we need to re-initialize a lot of new parameters for this extra-layer of parameters. If we could include fine-tuning of BERT, it would probably improve the performance of some models more.
@emoryjianghang yes I agree, definitely something we should add!
@alanakbik Thank you guys for the great library. We look forward to the updates!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Fine-tuning transformers is added in #1492 and will be part of the next release.
Hi,
What is the significance of the reproject_words argument in DocumentRNNEmbeddings and is there some recommended usage?
What affect will this reprojection have in the case of contextual embeddings (eg. BERT) versus fixed embeddings (eg. word2vec)?
I haven't seen much information in the docs/code or other issues about this.
Thanks for your help.