-
Hi, we would lke to know more about how decisions for sentence segmentation for Old Church Slavonic were made.
Unfortunately, this link is broken. http://folk.uio.no/daghaug/syntactic_guidelines.pdf
…
-
The UDPipe sentence splitter seems to be a bit too split-happy, creating many fragments. Is this dragging down performance of our BERT models? Furthermore, we put a lot of effort into splitting large …
-
Make it possible to do sentence segmentation and tokenization using MASC, e.g. along the lines of: https://github.com/scalanlp/chalk/wiki/Chalk-command-line-tutorial
-
Thank you for making the WebNLG dataset with the alignment available!
We would like to align sentences in the `original text` and the triples in `sortedtripleset`.
**Is there a function/procedur…
-
Hello Stefan,
I'm going to train another BERT model with different pre-training object from scratch. Then I will use it to compare with BERTurk and other Turkish pre-trained language models. In ord…
-
if i wanna use my own textual data to pre-train a electra from scatch, what is the format of the text?
Only sentence segmentation or even more ??
Please help.
-
Hello, thank you for your work. I would like to ask why you think the task of synchronized subtitles is important. How can it help in action generation and action understanding?
-
In [AG_Editable](http://libagar.org/man3/AG_Editable), we should implement [Unicode text segmentation](http://www.unicode.org/reports/tr29/) when performing word wrapping or selections.
Test cursor…
-
It seems our current word tokenizer is too dumb to see nbsp. That is not good. Not sure whether to replace that even before sentence segmentation or directly in the tokenizer.
ghost updated
8 years ago
-
Hi @tenkus47,
Is it possible to have more details in README in order to try this repo locally? (ie. database, env variables example and other necessary stuff) Thank you and congrats,
Lionel