-
At least most of the small open-source corpora should be available on https://teanga.io/
This includes at least:
* Gutenberg
* Brown
* CESS
* Chat-80
* CoNLL 2000, 2002, 2007
* All of UD?
…
-
As highlighted in stryker-mutator/stryker#1061 we'd need some clarity around indirect harms resulting from using services of a company that uses services of a company that causes harm.
-
Informação (usando HXL hashtags) | O que essa informação significa | What does this means? (machine translation)
-- | -- | --
#meta +id +v_zz_hxl | #description+label | #description +label +i_en
…
-
Originally posted in forum
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/8r8YOQgTBT4/xHpCTp9DAwAJ
```
From: Christopher Imantaka Halim
> Hi,
>
…
-
Moving the cursor by word does not produce correct results when the text contains Chinese characters. Currently, it seems to move character-by-character, but instead it should move word by word.
|…
-
1. more reasons **why** fonts should be free!
- too much focus of digitality ("because type designs are just data, and data should be free in a free society") not about functionality
- "should …
-
Khinalug [ISO 639-3: [kjj](https://iso639-3.sil.org/code/kjj)], Udi [[udi](https://iso639-3.sil.org/code/udi)] and Talysh [[tly](https://iso639-3.sil.org/code/tly)] in Azerbaijan use alphabets similar…
-
https://github.com/tesseract-ocr/tesseract/issues/648#issuecomment-271987456
>Indic may be troubled by the length of the compressed codes used.
@theraysmith Can you explain a little more about t…
-
Greetings,
Where do I go to contribute to the nltk corpus to add amharic support?
```
from nltk.corpus import stopwords
```
Thanks
-
The language in the principle section titles is inconsistent.
Many are positioned as statements of fact ("The Web is...", "The Web does not..."), but one is stated as a requirement ("The Web must.…
mnot updated
2 months ago