fhamborg / news-please

news-please - an integrated web crawler and information extractor for news that just works
Apache License 2.0
2.05k stars 423 forks source link

maintext article attribute length limitation #257

Closed zurek11 closed 10 months ago

zurek11 commented 10 months ago

I am implementing script where I need to get data from this website: here.

My code implementation is simple like this:

article = NewsPlease.from_url(my_url)
print(article.maintext)

The problem is that in maintext is not full text of that website main content. Someone can tell me like "news-please didnt recognize the other part of the content in the webpage", but it seems more to me that text has some size limit and I just want to ask:

  1. Is this issue really about size limit of maintext?
  2. If so, is there some possibility to raise that limit up?

Thank you so much in advance for your response.

fhamborg commented 10 months ago

if youre facing this issue only on that particular website, then its likely due to that particular website. feel free to reopen in case you find this happening on other website as well