Open EmilStenstrom opened 4 years ago
Hi @EmilStenstrom, thanks for your interest. Supporting more languages is WIP and we plan to include that in future versions.
Let me know if there’s something I can do to help! (Native Swedish speaker)
Hi @EmilStenstrom we meet again!
We are looking into training a Swedish BLINK, but we have noticed there is not much documentation on data preprocessing and training pipelines. Would it be possible for someone to add a step by step guide for training a model for another language? Especially how you go from the wikipedia dumps to training data. @ledw
I've created a new repository for training bi-encoder models, following this tutorial you can train the model in another language using a correct transformer model using the BLINK code or following this tutorial. But flair support was not implemented yet.
I've created a new repository for training bi-encoder models, following this tutorial you can train the model in another language using a correct transformer model using the BLINK code or following this tutorial. But flair support was not implemented yet.
The link is not available now. Can you update it? Thanks.
I've created a new repository for training bi-encoder models, following this tutorial you can train the model in another language using a correct transformer model using the BLINK code or following this tutorial. But flair support was not implemented yet.
Hi buddy, could you update this tutorial link? it's not available. thanks.
I've created a new repository for training bi-encoder models, following this tutorial you can train the model in another language using a correct transformer model using the BLINK code or following this tutorial. But flair support was not implemented yet.
Hi buddy, could you update this tutorial link? it's not available. thanks.
there's a tutorial on how to train on smaller biencoder model here https://github.com/facebookresearch/BLINK/issues/116
It looks like this architecture would work for non-english languages too. Wikipedia is availiable in more languages, flair has embeddings in other languages, and BERT is available elsewhere.
Is there something stopping this from being applied to eg. Swedish?