HIT-SCIR / ELMoForManyLangs

Pre-trained ELMo Representations for Many Languages
MIT License
1.46k stars 244 forks source link

ELMo weights.hdf5 and options.hdf5 files? #1

Open veronica320 opened 6 years ago

veronica320 commented 6 years ago

Thanks for this work! Could you please make available the weights and options file (in .hdf5 format), like how the allennlp pre-trained model works?

tnlin commented 5 years ago

+1, that will be perfect for many developers...

Oneplus commented 5 years ago

Sorry for late reply.

To my understanding, our release is not directly portable to AllenNLP because we support unicode characters. This leads to difference in model architecture. We have a char_emb layer of variable length to convert unicode character to embeddings, while they use a fix-sized char embedding layer .

Unfortunately, we don't have a good solution to make our release works with AllenNLP by now. I will leave this issue open to see any potential solution. Any solution or suggestion will be welcomed.

frankier commented 5 years ago

For people looking for a quick and dirty way to embed a sentence at a time (which is slower than using batches) feel free to reuse my part copypated hacked up code. See embed_sentence from https://github.com/frankier/finntk/blob/2f0ba49cd86002528431903c090d28852356eff7/finntk/vendor/elmo.py

TalSchuster commented 5 years ago

For people looking to use AllenNLP framework -

A few people asked me, so I thought it's better also to put it here if people reach this thread - I've merged to the AllenNLP repo code to support cross-lingual ELMo (with alignment to a mutual space as described in our paper cross-lingual alignment of contextual embeddings).

However, that code still requires AllenNLP trained ELMos. I trained it for a few languages (unfortunately, not as many as in this great repo) and you can find more details here - https://github.com/TalSchuster/CrossLingualELMo