Closed jglim closed 6 years ago
Hey, I actually think the byline thing wouldn't be an issue if you changed the selector to div.article-entry > p
. That automatically excludes the byline div.
That said, I'm super busy this week so I wouldn't have time to merge your code. Maybe next week.
Hey, starting from v3 I'm no longer planning on using content selectors to manually extract information. So I'm closing your pull request for this one. Appreciate it though! And if you have any further suggestions or future pull requests I'll be happy to consider them! Cheers.
Works okay for articles without bylines ( https://techcrunch.com/2017/10/19/two-google-alums-just-raised-60m-to-rethink-documents/ )
A little bit uglier with a byline ( https://techcrunch.com/2017/10/22/defensible-strategies-for-food-tech-entrepreneurs-facing-the-amazon-juggernaut/ )
Would be ideal to exclude
div.byline
in the body selector but the current implementation doesn't seem to be able to do that yet