PacktPublishing / Mastering-spaCy

Mastering spaCy, published by Packt
MIT License
125 stars 73 forks source link

A real-world example NER example Chapter 03 #10

Open mauuuuu5 opened 1 year ago

mauuuuu5 commented 1 year ago

Hi everyone I am copying from the book the code that takes the NY times article but I cannot get the book's output and also the code is not in chapter 03

from bs4 import BeautifulSoup import requests import spacy def url_text(url_string): res = requests.get(url_string) html = res.text soup = BeautifulSoup(html, 'html5lib') for script in soup(["script", "style", 'aside']): script.extract() text = soup.get_text() return " ".join(text.split()) ny_art = url_text("https://www.nytimes.com/2021/01/12/opinion/trump-america-allies.html") nlp = spacy.load("en_core_web_md") doc = nlp(ny_art) len(doc.ents) from collections import Counter labels = [ent.label_ for ent in doc.ents] Counter(labels)

Thank you

DuyguA commented 1 year ago

Hellos, thanks for writing. Yes, I put some selected code to the repo, not all of the book. Would it be possible to send a screenshot or just copy paste the output? This is a small code fragment, should work rather trouble free.

mauuuuu5 commented 1 year ago

Hi thank you for the reply, this is an image of the output

image

Cheers