Closed AndyTheFactory closed 10 months ago
Comment by iwpnd Mon May 28 11:09:56 2018
from newspaper import fulltext
then use
fulltext(html, language)
with html as text and language as the 2 digit language code.
Comment by akashmondal1810 Tue May 29 06:17:15 2018
thanks it worked
From: "Ben" notifications@github.com To: "codelucas/newspaper" newspaper@noreply.github.com Cc: "Akash Mondal" AKASHMONDALCIVIL@IITKGP.AC.IN, "Author" author@noreply.github.com Sent: Monday, May 28, 2018 4:40:05 PM Subject: Re: [codelucas/newspaper] passing page sourse(html) instead of url (#571)
from newspaper import fulltext then use fulltext(html, language)
with html as text and language as the 2 digit language code.
— You are receiving this because you authored the thread. Reply to this email directly, [ https://github.com/codelucas/newspaper/issues/571#issuecomment-392496444 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AbNWecloHvqnniH4vXgyLcpGbcd1IAZ2ks5t29sNgaJpZM4UN2mB | mute the thread ] .
Comment by chsuong Tue Jul 10 19:31:30 2018
It didn't work for me. Below is an example with an example html used https://github.com/codelucas/newspaper/issues/291. Any help would be sincerely appreciated!
my_html='''<!DOCTYPE html>
<html>
<body>
<p>My first paragraph.</p>
</body>
</html>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(my_html, "lxml")
html_text=soup.get_text()
from newspaper import fulltext
text = fulltext(html_text,'en')
Traceback (most recent call last):
File "<ipython-input-33-7d1b9f3a7dec>", line 2, in <module>
text = fulltext(html_text,'en')
File "/Users/chs/anaconda/lib/python3.5/site-packages/newspaper/api.py", line 91, in fulltext
top_node = extractor.post_cleanup(top_node)
File "/Users/chs/anaconda/lib/python3.5/site-packages/newspaper/extractors.py", line 1040, in post_cleanup
node = self.add_siblings(top_node)
File "/Users/chs/anaconda/lib/python3.5/site-packages/newspaper/extractors.py", line 869, in add_siblings
baseline_score_siblings_para = self.get_siblings_score(top_node)
File "/Users/chs/anaconda/lib/python3.5/site-packages/newspaper/extractors.py", line 926, in get_siblings_score
nodes_to_check = self.parser.getElementsByTag(top_node, tag='p')
File "/Users/chs/anaconda/lib/python3.5/site-packages/newspaper/parsers.py", line 123, in getElementsByTag
elems = node.xpath(selector, namespaces=NS)
AttributeError: 'NoneType' object has no attribute 'xpath'
Comment by lordrisborik Wed Jan 27 03:38:15 2021
chsuong , how did you solve above issue eventually? I am fraid I have to search/extract keywords from locally stored text/contents
error does not occure in 0.9.2
Issue by akashmondal1810 Fri May 25 11:58:41 2018 Originally opened as https://github.com/codelucas/newspaper/issues/571
i want to use newspaper lib. but instead of use it by passing url of article i want to to pass article page sourse. Is there any way I can do that ????