-
### Postup
- export publikace `uuid:add07e20-f3e5-11e4-88cd-005056827e52`
- v dialogu `Výběr formátu` zvoleno `TEI`, v sekci `Parametry stránek` nevybráno nic
### Výsledek
- zobrazí se okno o pr…
-
The heuristic in `split_tokenised_text_into_sentences.py` is too simplistic:
- Full-stops in quoted text such as in `' Is cuid den searmanas é . ' ar sise . ` should not count as split point.
- 3570…
-
I'm getting an error when I try to run the built in English demo. I've downloaded CoreNLP, UDPipe, and the models, but I'm hitting an error in the Python code that runs right after CoreNLP.
Does th…
-
Hello!
I stumbled upon this error during tagger training on some part of Taiga corpus of Russian language (~1 Gb of texts): ```"An error occurred during model training: Should encode value 65536 in o…
-
-
- textextract: https://textract.readthedocs.io/en/stable/
- bs4 gentext https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (but: beware sentence splitting)
- ftfy NFC https://ftfy.readthedocs.io…
-
# Lançamento UD (passo a passo)
Para lançamendo do DHBB1.0.0 no UniversalDependencies faremos uma sequência de passos bem definida com intuito de documentação e organização das tarefas (tais passos p…
-
We discussed how to deal with CJK languages last year, but Hebrew and Arabic are also difficult to deal with because of the direction of texts. Words should be right-alighted and run from right-to-lef…
-
Hi! Using ruimtehol on a Mac and happy with the result! Thank you. But then I tried to run the same script and the same training-label datasets on Windows. Something very strange, the result is comple…
-
Imagine the parser is trying to decide between `rela:subtype1`, `rela:subtype2` and `relb`. Let them have probabilities 0.25, 0.2 and 0.3 respectively. Will UDPipe simply select `relb` or will it sele…