PacktPublishing / Mastering-spaCy

Mastering spaCy, published by Packt
MIT License
125 stars 73 forks source link

Trichotillomania #5

Closed zebrassimo closed 2 years ago

zebrassimo commented 2 years ago

Hi Duygu, on page 116:

doc = nlp("I suffered from Trichotillomania when I was in college. The doctor prescribed me psychosomatic medicine.")
pattern = [{"LENGTH": {">=":10}}]
matcher.add("longWords", [pattern])
matches = matcher(doc)
for mid, start, end in matches:
    print(start, end, doc[start:end])

I'm getting

0 1 I
3 4 Trichotillomania
5 6 I
12 13 prescribed
14 15 psychosomatic

Having "prescribed" in there is fine (page 116 however says differently), yet finding "I" is sort of unexplainable to me. I'm on spaCy version 3.1.3. Is this a bug or a feature?

Greets, rudisoft

DuyguA commented 2 years ago

Hi Duygu, on page 116:

doc = nlp("I suffered from Trichotillomania when I was in college. The doctor prescribed me psychosomatic medicine.")
pattern = [{"LENGTH": {">=":10}}]
matcher.add("longWords", [pattern])
matches = matcher(doc)
for mid, start, end in matches:
    print(start, end, doc[start:end])

I'm getting

0 1 I
3 4 Trichotillomania
5 6 I
12 13 prescribed
14 15 psychosomatic

Having "prescribed" in there is fine (page 116 however says differently), yet finding "I" is sort of unexplainable to me. I'm on spaCy version 3.1.3. Is this a bug or a feature?

Greets, rudisoft

Hellos back!

So, i tried the code quickly. I created a clean Matcher object from scratch. I attached a screenshot, it looked normal to me: quickshot

If you're trying the chapter code sequentially, it might be the case that some previous patterns remained within your Matcher. For instance, there's an onlyShort pattern at page 112, which matches to 1 length tokens. Would it be the case?

zebrassimo commented 2 years ago

Thanks Duygu, I am running each chapter in a different Jupyter notebook and you rightfully pointed to the correct issue.

DuyguA commented 2 years ago

Great, happy reading :wave: