add itemprop=name to extractor; also fix techcrunch1 test

ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document

Apache License 2.0

2.15k stars 221 forks source link

Open collinwu opened 5 years ago

collinwu commented 5 years ago

The New York Times leverages the schema.org itemprop = name microdata to denote their publisher value.

just wanted to enhance this library to support that.

also fixed the test failure on make test for techcrunch1 links