Closed zebrassimo closed 3 years ago
I wrote some chapters before v3. I made a second pass to the code after v3 came out, some of the code might have escaped from my attention. I changed lang model shortcuts along the book, some might have escaped as well. Thanks for reporting!
Hi Duygu, in the book mastering spaCy, page 46, chapter 2 we have the following code:
import spacy from spacy.symbols import ORTH, LEMMA nlp=spacy.load('en') special_case = [{ORTH: "Angeltown", LEMMA: "Los Angeles"}] nlp.tokenizer.add_special_case(u'Angeltown',special_case) doc=nlp(u'I am flying to Angeltown') for token in doc: print(token.text, token.lemma_)
trying to import the language model en by
python -m spacy download en
doesn't work:
As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the full pipeline package name 'en_core_web_sm' instead.
Not sure if
special_case = [{ORTH: "Angeltown", LEMMA: "Los Angeles"}] nlp.tokenizer.add_special_case(u'Angeltown',special_case)
fails with Unable to set attribute 'LEMMA' in tokenizer exception for 'Angeltown'. Tokenizer exceptions are only allowed to specify ORTH and NORM because of that.
However this get's the job done:
import spacy from spacy.symbols import ORTH, LEMMA, NORM nlp=spacy.load('en_core_web_md') special_case = [{ORTH: "Angeltown", NORM: "Los Angeles"}] nlp.tokenizer.add_special_case(u'Angeltown',special_case) doc=nlp(u'I am flying to Angeltown') for token in doc: print(token.text, token.lemma_)
Thanks for the good book!
HI, This is not working properly, we changed to norm, not lemmas(entire topic is changing bro)
https://stackoverflow.com/questions/66360602/spacy-tokenizer-lemma-and-orth-exceptions-not-working
Hi Duygu, in the book mastering spaCy, page 46, chapter 2 we have the following code:
trying to import the language model en by
doesn't work:
__As of spaCy v3.0, shortcuts like 'en' are deprecated. Please use the full pipeline package name 'en_core_web_sm' instead.__
Not sure if
fails with Unable to set attribute 'LEMMA' in tokenizer exception for 'Angeltown'. Tokenizer exceptions are only allowed to specify ORTH and NORM because of that.
However this get's the job done:
Thanks for the good book!