Closed f-wole closed 4 years ago
Hi @f-wole
thanks for that hint! Vocab file is correct, but in the config file there's a wrong vocab size. I'll fix that now :)
Update on that: unfortunately, I used the vocab size value of 32102 in the configuration for training the model. In order to change fix I would need to re-train the model, which is currently out of my resources.
However, the model is working and I also did all evaluations with the configuration that is deployed on the model hub.
Yes, I saw that the model expects a vocabulary size of 32102 from the dimension of word_embeddings matrix: embeddings.word_embeddings.weight torch.Size([32102, 768])
So are you suggesting it would be possible to use bert-base-italian-xxl with a vocabulary of size 31102?
It is possible, I did evaluations with the NER example script in Hugging Face Transformers library for NER and PoS tagging.
I just updated the README to mention the vocab and config size mismatch :)
Thanks again for finding this!
Hi, thanks again for these models! I was trying to use the bert-base-italian-xxl models, but I noticed that there is a discrepancy between the vocabulary size in the config.json file (32102) and the actual size of the vocabulary (31102). Is it possible that the wrong vocabulary is uploaded?