allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.73k stars 2.24k forks source link

ELMo Models in Different Languages #1712

Closed fsonntag closed 5 years ago

fsonntag commented 6 years ago

Is your feature request related to a problem? Please describe. I would like to use an ELMo language model in other languages than English, concretely German.

Describe the solution you'd like Additional available ELMo models in other languages, i.e. German, without the need to train them myself.

Describe alternatives you've considered Use the training code to train them myself. Unfortunately this is very expensive.

Also thanks for the great product :) it's a joy using AllenNLP!

KeremZaman commented 5 years ago

There is a repo for this: https://github.com/HIT-SCIR/ELMoForManyLangs but some of the embeddings are not accessible.

fsonntag commented 5 years ago

Thanks @KeremZaman, thanks for the answer! Unfortunately they don't provide the embeddings in a format that is compatible with AllenNLP. But if you managed to get them incorporated into AllenNLP, I would be happy to know!

schmmd commented 5 years ago

@fsonntag we're presently discussing whether to invest heavily in extending ELMo to multiple languages next year. Stay tuned!

fsonntag commented 5 years ago

Thanks a lot for the answer @schmmd

matt-gardner commented 5 years ago

We're pretty settled that we're not going to be investing in doing this ourselves, at this point. As we've said in other issues, if people want to contribute back pre-trained models in other languages, we are happy to host them and say very nice things about the people who contribute them.

adelra commented 5 years ago

I am also interested in training ELMO for other languages such as Persian which I have the datasets.

What are the steps? Can you provide us with a reference or a guide to do so? The reason that I'm asking so is that we probably want all the pre-trained models to be in a certain format so that switching between languages would not require much work in terms of coding.

schmmd commented 5 years ago

@adelra we should have a training module for ELMo in AllenNLP soon which should make training ELMo for other languages easier. Presently you would need to follow the instructions in https://github.com/allenai/bilm-tf however.

ahmedcs2019 commented 5 years ago

Is ELMo Models for non-English currently available??

schmmd commented 5 years ago

We have a contributed model for Portuguese. See https://allennlp.org/elmo.

ahmedcs2019 commented 5 years ago

I need to apply it in Arabic text??

ahmedcs2019 commented 5 years ago

if there any way to build it by my self??

matt-peters commented 5 years ago

You can train the original LSTM architecture in your corpus using https://github.com/allenai/bilm-tf

You can train a transformer version using allennlp, see https://github.com/allenai/allennlp/blob/master/tutorials/how_to/training_transformer_elmo.md

TalSchuster commented 5 years ago

People looking for more languages can find some here: https://github.com/TalSchuster/CrossLingualELMo

matt-peters commented 5 years ago

Thank you so much Tal, and your Cross Lingual ELMo paper (and code) is awesome. Congrats. Love it 💯

TalSchuster commented 5 years ago

Thank you Matthew! Your code is written very well so it was very convenient to extend upon