-
Hi,
There are some monolingual corpora available for Indian languages at this [cite](http://wortschatz.uni-leipzig.de/en/download/) in **All Languages**, corresponding paper is [here](http://www.lrec…
-
Hi,
https://github.com/bedapudi6788/LOIT in this repo I added large twitter datasets for telugu (7.9 million) and hindi (17.6 million) and fasttext skipgram and cbow word vectors for the same.
-
fastText is a new library to create vector models of words, it has been developed and released by Facebook AI team.
https://github.com/facebookresearch/fastText
https://fasttext.cc/docs/en/aligned-v…
-
This issue is about corplist.
Even though the groups "LINDAT monolingual corpora", "LINDAT speech corpora" and "LINDAT parallel corpora" are not marked in any special way in corplist, they are listed…
-
What was the format for translation task?
Do you provide sequence of pairs delimited by new lines, e.g. "sentence1 = translation_of_sentence1 \n sentence2 = translation_of_sentence2 \n ... \n testing…
-
hey, thanks for sharing the code. I am working on the multilingual text. Can I give more than one language to segment words/sentences?
-
Hi, I wonder if bert multilingual representations can perform like other multilingual embeddings obtained by aligning monolingual embeddings (like [fastText multilingual](https://github.com/Babylonpar…
-
- [ ] Loading corplist takes ages.
- [ ] click Universal Dependency 2.3 -> unfolding the list of corpora takes ages
- [ ] click on the name of your favourite language next to a corpus in that langua…
-
I downloaded the en-de model, and am now trying to replicate training.
I had to make some guesses (e.g., how to specify the training data, what
the switches -m and -c need), so I am running this …
-
Hi Yunsu,
Thanks for your excellent work. I am trying to repeat your result in the paper. I have a question about the Language Model.
For training with kelnm, what corpus is used? Same with the …