-
### Resource Type
_No response_
### Describe the problem or limitation you are having
4.x just added binary tokenization back:
https://github.com/godotengine/godot/pull/87634
### Describe the fea…
-
Anyone who would like ``SLTev`` to support custom tokenizers (e.g. via ``--tokenizer=...``), please discuss here.
Let's add only features people need.
Pull requests are also welcome.
-
I've been thinking about ways of deleting Oniguruma from my bundles, which is needed for handling TextMate grammars, which are commonly used to syntax highlight languages, and Glow seems a very intere…
-
1. Navigate to the "Tokenize Verse" feature in the GUI.
2. Select a verse and initiate the tokenization process.
3. Review the results.
Expected Behavior: The verse should be successfully tokenized…
-
While doing some testing, I noticed that the tokenizer treats gullermets punctuation marks `«`, `»` differently from the more common `"`, `'`. Look a this string: «a sentence between guillemet». Your…
-
The tokenization never stops for some specific instances, for example:
`"it('should remove the elements domProps'), () => {"`
It may be caused by the catastrophic backtracking of `Func_Name_Recursiv…
-
-
```
When writing a tokenization unit test for the ClearTK wrappers for ClearNLP, I
found an inconsistency between OpenNLP's tokenization and ClearNLP's.
Consider the string:
String s = "\"John & Mar…
-
```
When writing a tokenization unit test for the ClearTK wrappers for ClearNLP, I
found an inconsistency between OpenNLP's tokenization and ClearNLP's.
Consider the string:
String s = "\"John & Mar…
-
It would be great to add the org.apache.lucene.analysis for smarter tokenization for all languages. In this way, processing other languages such as Chinese is more sensible with your library.