codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
13.89k stars 2.1k forks source link

It can't work with BBC #939

Closed Qggg closed 1 year ago

Qggg commented 2 years ago

I try it with BBC, for example :https://www.bbc.com/news/world-europe-61204543

it can't fetch Authors ,and can't fetch the full text.

something went wrong.

johnbumgarner commented 1 year ago

Take a look at this document that I wrote on using newspaper3k. It has details on extracting content from BBC. Let me know if my extraction guidelines work for you.

Qggg commented 1 year ago

Take a look at this document that I wrote on using newspaper3k. It has details on extracting content from BBC. Let me know if my extraction guidelines work for you.

Yes,it works. You pick the elements from script segment, in this way, you can also pick the 'article content text' in 'script nonce' seg. It seems like there are no easy and clear way to work with BBC.

Thank you.

it can be closed.