-
HI,
I'm trying to create a dense representations from my corpus and search paragraphs/phrases by keywords or a question. I don't have labeled Questions and Answers and I don't need for now to get a…
-
1. Downloading the Europarl corpus for En- FR language pairs.
2. learn w2v training algorithm on both side:
- -size 200 -window 10 for FR
- -size 800 -window 10 for EN
3. stop word removal fro…
-
First I would like to thank the authors of this tutorial/repo for this great resource . It has helped me tremendously in understanding what's semantic parsing really is .
For my project , I need t…
-
Hi there,
I've just come to realize that there are a lot of different types of quotation marks depending on where you're from.
Do the language models being used by translateLocally at all take t…
-
Hey there!
I've trained a truecase model for the german language on a dataset of 1 million sentences. The resulting model is quite big (80MB) and I am having memory issues including it into my anno…
-
Do you have a plan to reproduce the BERT NER model? I tried, but with Bert_base, the best micro-avg Test F1 on CoNLL-2003 is 91.37, while the reported in the paper is 92.4.
-
Hello, thanks for your great work! Can you provide a detailed script to illustrate the way to prepare the translation dataset?
-
Currently, we use SentencePiece in Tokenizer for our models contain ZH/JA in which no space serves as a natural word boundary.
The SentencePiece model is applied after Tokenizer's `none` mode.
`node…
-
Hi! First of all, thank you for this amazing repo :)
While spending time with the medium model, I noticed this:
```
>>> import spacy
>>> nlp = spacy.load("tr_core_news_md")
>>> nlp("Erzurum'da …
-
Since that gluon-nlp already has very good tools for BERT and also that it has basic data processing for named entity recognition ready from https://github.com/dmlc/gluon-nlp/pull/466 , I wanted to bu…