codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
13.89k stars 2.1k forks source link

fix itemprop containing articleBody #953

Open AndyTheFactory opened 1 year ago

AndyTheFactory commented 1 year ago

If itemprop is not exactly == "articleBody" the node was "cleaned"

for instance itemprop="description articleBody" would be cleaned. Blogspot / Blogger for instance uses this itemprop