-
To request a new code snippet, please fill out the following:
Project name: Remove All Whitespace
Project link: https://sampleprograms.io/projects/remove-all-whitespace
Language: F#
> The pro…
-
**Problem Description:**
In the context of search results, names that begin with lowercase letters are sorted after or before names that begin with uppercase letters. Case should not be factored int…
-
Hello,
First thank you for one more great package!
I meant this issue to be less about the package and more about opening a discussion. Apologies, if this is wrong place to start and - if that's…
-
https://en.wikipedia.org/wiki/Whitespace_(programming_language)
-
When create a project , the default tokenize in pipeline is WhitespaceTokenizer,
if my project is base on no-whitespace language(ie:chinese, japanese),
How should I set the pipeline?
Or I didi not …
-
:)
ghost updated
4 years ago
-
Currently stutter uses `/[\n\r\s]+/` as a delimiter, so languages not separated by itself, such as Japanese, are unusable.
It seems [google/budoux](https://github.com/google/budoux) will do for at le…
-
We need to implement the "simple data types" defined by NIST Metaschema in lutaml-model:
The simple data types are provided here:
* https://github.com/usnistgov/metaschema/blob/develop/schema/xml/…
-
We do things slightly differently to cucumber, removing this line will cause a failing test
https://github.com/Behat/Gherkin/blob/master/tests/Behat/Gherkin/Cucumber/CompatibilityTest.php#L36
-
We currently use Moses tokenizer for alignments because it seems like it's a standard in the MT world and it's what OpusTrainer supports for detokenization (we will likely feed tokenized text to it to…