-
Hi,
We would like to use Stanza to do the pre-processing stages including stopwords/punctuation/special characters removal. We noticed that this step does not seem to be part of the pipeline. We are …
-
I just wanted to ask if there are any plans to upgrade spacy 1 with any of the features introduced in spacy 2; e.g. improvements to the lemmatiser and custom token attributes.
I mainly use spacy fo…
-
Source: https://wiki.digitalclassicist.org/Stopwords_for_Greek_and_Latin
From #3
-
Hi Chris, thanks for the big 2.0 updates!
This is regarding the following section of the README
> Note: ERRANT does not support spaCy 2 at this time. spaCy 2 POS tags are slightly different from…
-
Can't build the newest version:
Suffix fail! Trying to rstrip pi from Alarautalahti
Unstubbable! Trying to rstrip pi from Alarautalahti
Word has been misclassified or suffix stripping is insuff…
-
As far as I see, there is no lemmatizer built in in SoMeWeTa. It would be more comfortable to have a third column in the output that contains the lemma.
My output looks like this:
> Geschlagene AD…
-
Hi all,
I am facing problems with the model for the Greek language. Mainly for part of speech tags (failure on verbs is quite high) and lemmas. For example:
1. 'Έχεις αδέρφια;' - Here the verb is …
-
I would like to have a pure console version without server.
Server mode could be good for effective processing of big texts, but I would like to have a simple command-line tool, even at cost of slo…
-
Is there a reason as to why the `extract-lemmatize-nonstop-words` package was completely removed from NPM and GH?
It was the best package I've found and it's gone now.
-
Hi Bart,
It looks like CST have just released the resources for the lemmatizer on a GPL license, just as you said they might do when we met in April.
In light of that, I decided to try and inst…