Open alanakbik opened 1 year ago
This is the code for the NER corpus I've used: https://github.com/lang-uk/flair-ner/blob/main/train_base.py#L32
and the code for the POS corpus: https://github.com/lang-uk/flair-pos/blob/main/train_grid.py#L21
I'll take a look if I have fixed split for ner hosted somewhere else
Really cool idea!
I had to do a lot of manual preprocessing steps to get NER working when evaluating the ELECTRA model:
https://github.com/stefan-it/ukrainian-electra/blob/main/download_prepare_data_ner.sh
Oh, @stefan-it thanks for reminding me. Totally forgot about fixed split.
On a separate topic. Would you like to try to train electra on a better quality ukrainian texts?
Hey @dchaplinsky , I currently have access to TPUs, so if you have texts available I would love to pretrain another model :hugs:
Yes I do! Could you contact me at chaplinsky[dot]dmitry on gmail?
Hi @alanakbik and @stefan-it
I've just uploaded two bigger models for the Ukrainian language: https://huggingface.co/lang-uk/flair-uk-forward-large https://huggingface.co/lang-uk/flair-uk-backward-large
Those has hidden_size=2048 (in contrast to the 1024 of the original ones) and trained on my data + data from Stefan (54gb in total).
I've also trained a downstream NER model on them, and got a nice 1.5% improvement over the previous one, will publish it shortly.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue tracks the progress of adding support for the Ukrainian language from lang-uk to Flair. We would like to add:
embeddings = FlairEmbeddings('uk-forward')
andembeddings = FlairEmbeddings('uk-backward')
tagger = SequenceTagger.load('ner-ukrainian')
tagger = SequenceTagger.load('pos-ukrainian')
corpus = NER_UKRAINIAN()
. Should be integrated only once version 2.0 is complete.corpus = UD_UKRAINIAN()
.