-
For some context, here was the master issue for problems in lemmatization for the lookup-based lemmatizer for German: https://github.com/explosion/spaCy/issues/2486 And here was the announcement that …
-
Create a filter/function to group identical analyses into a single entry. For example, analyses `18` and `19` of `forma` (Du Cange) are identical:
```
============================ANALYSIS 18======…
-
**Describe the bug**
According to the [documentation](https://manual.manticoresearch.com/Creating_an_index/NLP_and_tokenization/Morphology#index_exact_words) and Manticoresearch team comments, opti…
-
Currently, there is no way in the UD English treebanks to differentiate between adjectives that refer to common nouns and those that refer to proper nouns -- both are annotated as `ADJ+JJ`.
This ma…
-
### context
I'm looking to get the original token positions of keyterms when performing keyterm extraction with e.g. TextRank, but this can apply to the other extractors. Example:
```python
>>> d…
-
When using a model like `qanastek/pos-french-camembert`, a verb such as `finissions` results in multiple tokens with VERB entities like `["fini" VERB", "ssions" VERB]`. This does not happen with the f…
-
The Word List part of the IIP website is generated by code that is in this repository (iip-production).
There is a `wordlist.html` template in the templates directory: https://github.com/Brown-Unive…
-
hmm i dont want to write, i write enough at work. so im gonna pull together some unified search table and attach a labeling system. labels will be able to filter search results--this is a data-agnosti…
-
1. lemmatize the ukwac+wacky corpus using Jobimify tool:
```
frink:/home/panchenko/jobimify
```
- use the concatenation of these corpora http://cental.fltr.ucl.ac.be/team/~panchenko/d…
-
I have been working with natural language processing and often needed to know which words were used in certain corpora. Many dictionaries are comprised of word stems, requiring the extraction of stems…