-
We should merge some words because we have to identify them as one entity. Classical example is to distinguish between _integer_, _positive integer_, and _negative_integer_.
After discussions with …
-
We use `JapaneseTokenizer` on prod and seeing some inconsistent behavior. With this text:
`"マギアリス【単版話】 4話 (Unlimited Comics)"` I get different results if I insert space before `【` char. Here is the s…
-
To make it easier for new contributors, we should probably have our own "linter" for commit messages, i.e. validate that at least some of the conventions are met:
- the first line should not exceed…
dscho updated
4 years ago
-
# Steps to Recreate
## In CLAMP_WIN 1.6.6
### Build following pipeline:
```
```
### Export to jar (__test_project.pipeline.jar__)
…
-
Using the SOC python notebook, spacy did a good job annotating the following phrase with entities:
`a letter from the King of Jerusalem, i.e. John de Brienne `
However, when we have a choice eleme…
-
```
It would be very nice to have a ModalVerbsFeatureExtractor for German as well.
The actual modal verbs that are looked up in this FE extractor could be passed
as a wordlist parameter. So the Moda…
-
What is the minimum working example of a code that I feed in a string and gives me the named entities?
```
from ccg_nlpy import local_pipeline
pipeline = local_pipeline.LocalPipeline()
d = "…
-
The only way to achieve lemmatization today is to use the SynonymFilterFactory. The available stemmers are also inaccurate since they are only following simplistic rules.
A dictionary-based lemmatize…
-
1. Can I train for an updated model?
2. Does it improve by adding more dictionary words?
3. How do I add new rules?
-
```
It would be very nice to have a ModalVerbsFeatureExtractor for German as well.
The actual modal verbs that are looked up in this FE extractor could be passed
as a wordlist parameter. So the Moda…