ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document
Apache License 2.0
2.15k stars 221 forks source link

add itemprop=name to extractor; also fix techcrunch1 test #104

Open collinwu opened 5 years ago

collinwu commented 5 years ago

The New York Times leverages the schema.org itemprop = name microdata to denote their publisher value.

example: view-source:https://www.nytimes.com/2019/08/12/world/asia/hong-kong-airport-protest-cancellations.html

just wanted to enhance this library to support that.

also fixed the test failure on make test for techcrunch1 links