-
I want to adapt Jpara to be more casual/conversational by using dataset like JESC.
However, beside that and Open Sub, parallel dataset for conversational is seriously lacking. I'm thinking of mini…
-
I am trying to pretrain BART further from the huggingface checkpoint with the below command, and it seems like there is an issue with mismatched amount of arguments for _tokenize.
The command is b…
-
Here you will find a long list of the articles thats need to be coded. They are divided into sections, one for each coder (TR = Timo, MR = Melanie, JC = Joseph, AB = Agata, LK = Liam). Each item in th…
-
6. If you like, you can use some other data, for instance a Wikipedia dump for some other language, such as Finnish: https://linguatools.org/tools/corpora/wikipedia-monolingual-corpora/. First you nee…
-
## Environment info
- `transformers` version: `4.0.0.dev0`
- Platform: `Ubuntu 20.04.1 LTS`
- Python version: `3.8.5`
- PyTorch version (GPU?): `1.7.0` (GPU - yes)
- Tensorflow version (…
-
Is there an untokenized version? I get a feeling that the corpora is tokenized with indicnlp library, space-separating punctuations?
```
$ head ml.txt
മകനു വേണ്ടി അച്ഛനും പ്രമുഖ നിർമ്മാതാവുമായ എ …
-
Hi there,
I am using OpusFilter on the `nlingual-rebase` branch to train a monolingual BERT model. In some of my corpora, there are empty lines which denote a document boundary, e.g. an empty line …
jbrry updated
3 years ago
-
Dear Hao and Sudha,
Can you help us out here? I appreciate you must be really busy but there isn’t long until the workshop and a number of the papers (extended abstracts) are missing from the REPL4…
-
Hi there, thanks for the excellent tool!
I am trying to filter corpora to train a monolingual language model. As such, I am using the `nlingual-rebase` branch as it seemed to be the most up-to-date…
jbrry updated
4 years ago
-
[https://arxiv.org/abs/1710.04087](https://arxiv.org/abs/1710.04087)
Short Description:
> State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionarie…