Training with XLM-RoBERTa - Githubissues

Hyperparticle / udify

A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.

https://arxiv.org/abs/1904.02099

MIT License

219 stars 56 forks source link

Training with XLM-RoBERTa #11

Open ssdorsey opened 4 years ago

ssdorsey commented 4 years ago

Hi, has anybody looked into training a version of udify with XLM-RoBERTa? Seems like it could help with the low-resource languages in multilingual BERT so I'm planning on giving it a go if nobody else has already.

Hyperparticle commented 4 years ago

That's a good idea. Now that I see that Hugggingface has added support for it, it should be straightforward to add support here. I might get around to it, but feel free to try it yourself.

Training on my single GPU might take a while. 🙂

ssdorsey commented 4 years ago

I've got a couple spare 2080ti that should do the trick. I've never used AllenNLP before so I'm a little unfamiliar with how all these config files work. If you could give me some general guidance on what I would have to update in the code I'm happy to take a crack at it and share my results.

Hyperparticle commented 4 years ago

The first thing to do would be to add the latest transformers release to requirements.txt which has XLM RoBERTa here and here. Then it should be imported into udify/modules/bert_pretrained.py and replace BertTokenizer/BertModel/BertConfig wherever necessary. Finally, copy config/ud/multilingual/udify_bert_finetune_multilingual.json and modify it to point to xlm-roberta-base instead of bert-base-multilingual-cased (and a new vocab.txt file which can be extracted from the pretrained model archive).

There might be a few details I missed, but I think that's most of it. I also highly recommend using a debugger inside udify/modules/bert_pretrained.py to see what the data looks like.

Thanks for offering your help!

ssdorsey commented 4 years ago

Thanks, I'll take a crack at it

ssdorsey commented 4 years ago

Followed the steps you outlined and modified a few other things as well (i.e. modifying the special tokens and tokenizer) but I keep running into allennlp errors that I can't quite sort. I have plenty compute available if anybody else manages to get this running but I don't think I'll be able to.

ssdorsey commented 4 years ago

Update: came back to this and figured it out. Just had to deal with the differences between how pytorch_pretrained_bert and transformers dealt with model outputs. Training now.

Hyperparticle commented 4 years ago

How's the training going? Any problems?

ssdorsey commented 4 years ago

I had a few issues with gradient explosion. Had to take a couple days off but I'm getting back at it now to see if I can get it going again.

ArijRB commented 3 years ago

Hey, I am trying to train a version of udify with a bert-like model, I was wondering if you got any updates for the changes needed ? Thank you in advance @ssdorsey @Hyperparticle

prashantkodali commented 2 years ago

@ssdorsey or anybody else,

were you able to train the model? Looking to do the same with XLM-R but if you have any experience that you can share, it would be really helpful. TIA.

shaked571 commented 2 years ago

Any of you might have some updates regarding the training?

guptabhinav49 commented 2 years ago

@ssdorsey can you once tell us how exactly did you handle the outputs finally? I'm getting this error even after changing the config files. RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling ``cublasCreate(handle)\``

I'm trying to use "ai4bharat/indic-bert" as pre-trained model. The procedure will be very similar to what you would have done in XLM-RoBERTa.