-
tokenize this text:
"+Noster+poēta+,+nisi+cīvis+Rōmānus+esset+,+ā+populō+nunc+cīvitāte+dōnārētur+.+"
(e.g. http://services.perseids.org/llt/segtok?xml=false&shifting=false&newline_boundary=1&inline=tr…
-
Hi @tomerm @semion1956,
As it seems, today I need to run Tokenization part on the raw data and then load the output for the models.
The problem as I can see it is that we are going to run many tes…
-
Hi,
When i am running this command for this particular file only. The error is arising as you can see below.Can you please tell. me why is this so only for this particular file.
file name : dat…
-
Hi,
Is there a sample for tokenization with encryption?
Java / Kotlin don't have a Cipher of the type 'RsaOaep256' out-of-the-box; so I was hoping to check some sample code to see how the PAN i.…
-
Many tasks seem to suffer from tokenization and spacing issues. It looks like such data was tokenized and later reconstructed by joining with spaces.
Are there original versions of these input texts …
-
Some datasets have been pre-processed with Moses tokenizer (or some other tokenizer), which incorrectly handles halant, considering it to be punctuation and adding spaces around it. Add functionality …
-
Probably want to do more splitting off of copulas for cross-lingual consistency:
Adjectives s'messey, s'odjey, sloo, etc. and also stuff like saillym, shegin, shione, shynney (PM p140ff)
-
Change user/password requests to tokenizations from OpenMRS. This will need to be coordinated with Albert on the client side as well.
https://wiki.openmrs.org/display/docs/REST+Web+Services+API+For+C…
-
This post relates to the effort to harmonize the Ancient Greek treebanks, as per [Issue 7](https://github.com/unipv-larl/UD4HL/issues/7).
One of the first issues to solve is tokenization itself. Th…
-
**Describe the bug**
When tokenizing text, for example:
`[token for token in nlp("A kutya evett egy csontot!.")]`
The expression !. is considered a single token, and is also combined with the prece…