codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
13.89k stars 2.1k forks source link

Can I get the CSS Selectors ? #977

Closed izJoey closed 8 months ago

izJoey commented 8 months ago

Hi, there is a way to get the CSS selectors in the articles founded? Like article title, text, and date.

I tried so hard and no success.

AndyTheFactory commented 8 months ago

Hi, for the article text you can use the article.top_node to see where it got it's text.

unfortunatelly for title and date the way it detected them is not passed on to the user.

btw, i have forked the project (since this one is no longer maintained) and you can use it as a less buggier version of newspaper3k - https://github.com/AndyTheFactory/newspaper4k or pip install newspaper4k

izJoey commented 8 months ago

Thx for the help!

Oh, you did a good work there.