-
UD guidelines currently do not specify how to mark document and paragraph boundaries and for many treebanks such information is not available (original text gone, sentences shuffled etc.) But where it…
-
Hello,
I want to finetune the language model on domain-specific tasks.
Could anyone tell me what kind of custom text file require for fine-tuning the model?
Will it be okay if I put all sentences…
-
## My Environment
* **spaCy version:** 2.2.2
* **Platform:** Linux-5.0.0-25-generic-x86_64-with-Ubuntu-18.04-bionic
* **Python version:** 3.6.8
* **Machine:** AMD Ryzen Threadripper 2950X 16-Core…
-
Just as is the case with well established usages of attributes native to att.lexicographic within the dictionary module, there are identical use-cases for these attributes that arise in the developmen…
-
Hasta ahora estamos usando los fragmentos de los ficheros de entrada, `.vtt`. O sea, el texto de las 1-2 líneas que aparecen en un determinado momento en pantalla. Lo hacíamos así porque era lo más se…
dcabo updated
4 years ago
-
[CourtListener](http://www.courtlistener.com/) is starting to add state Supreme Court decisions to their offerings, and intends to add all fifty states. Consequently, it is sensible to bake support fo…
-
I am about to finetune a multilingual BERT model using English and Chinese text from the legal domain.
My corpus is around 27GB, how long should I expect to train 3 epochs (default parameters) us…
-
#### Description
Hi, I tried training a model, with
```
from gensim.models import Doc2Vec
model = Doc2Vec(min_count=1, window=10, size=100, sample=1e-4, negative=5, workers=7)
model.…
-
I wanted to start contributing with our local students to [Karakalpak](https://en.wikipedia.org/wiki/Karakalpak_language) corpus.
Thanks in advance!
-
```
Jun 19 19:26:53 ip-172-31-58-70 docker-compose[7160]: app_1 | wr.io: 2019-06-19 19:26:53: [ERROR]: 500 (Internal Server Error) raised by https://wr.perma-archives.org/public/n3ry-mj6…