Conceptual mistake in Custom_Named_Entity_Recognition_with_BERT.ipynb ?

Hello:-)

First of all I wanted to thank you for putting all this work into tutorials. It was a lot of help for me in my job:-) I'm trying to finetune a BERT token classification model for Estonian language BERT model from HuggingFace hub on my own custom labels. I used your notebook to spin up an implementation fast. However, I noticed that you finetune the model by propagating the gradient through ALL of the parameters. I thought that unlike models like GPT, BERT only requires finetuning the final layer? See for example: https://luv-bansal.medium.com/fine-tuning-bert-for-text-classification-in-pytorch-503d97342db2 He's cutting off gradients for all of the parameters here: for param in model.bert_model.parameters(): param.requires_grad = False

I've seen the same remark in other places, the original BERT paper I think, that the point is that you only train the Head of the model when adapting this language model for your task.

I'm pretty new to Transfomers and I'm swimming in complexity trying to understand all these things, so I might be wrong.

Any comments would be appreciated! I wouldn't mind submitting a pull request if you like:-)

Best, Maria

NielsRogge / Transformers-Tutorials

Conceptual mistake in Custom_Named_Entity_Recognition_with_BERT.ipynb ? #281