-
Thank you for creating the tool for public use!
I found that the tokenizer does not work well in some occasions. Is there any way to give a delimited input to your POS and dependency parser directly …
-
It will be useful to create a comprehensive practical guide for topic modeling. Now we have all components in place:
- POS tags and lemmatization - thanks to `udpipe` package
- `coherence` measure…
-
Hey Jan, thanks for the awesome work. Been using the R package to handle lemmatisation on media corpora for multiple Central and Eastern European languages, however, I am wondering if there is a way t…
-
Once we have output of UDPipe in Spark DataFrame, how can 2 articles be compared for similarity? On what level (sentence or article)?
-
Hi,
Once in a while I happen to get a dataset that is already pre-tokenized (dataframe with columns for tokens and with doc_id). Every time that happens I need to search forever to figure out how …
-
spacy benchmarks model inference speed in [words per minute](https://spacy.io/api/cli/#benchmark-speed).
This could be useful info for model comparison.
Stanza is painfully slow, udpipe is very fa…
-
Add to LimpiaR to form part of pre-processing - plus dances nicer with the dependencies as LimpiaR is v lightweight still
-
- [ ] Chybí možnost hledat v plném textu napříč publikacemi
- [ ] Chybí možnost hledat v položkách UDPipe
- [ ] Nefunguje vůbec výrazová logika, protože se vše automaticky obaluje uvozovkami (neodpo…
-
Hello! Maybe there is something not working correctly with token.idx on portuguese.
I think the cause is multiword token. In portuguese "da" (of the) is a contraction of "de + a").
I saw https:…
-
`PythonInR` is not available on cran anymore and is not compatible with newest version of R. Is it possible to make the package work with `reticulate` instead? The package worked before (using steps t…