extractus / article-extractor

To extract main article from given URL with Node.js
https://extractor-demos.pages.dev/article-extractor
MIT License
1.56k stars 134 forks source link

How to set the rule of extracting picture when the default extraction algorithm can't get it? #362

Closed MJRT closed 1 year ago

MJRT commented 1 year ago

For example: https://www.ithome.com/0/714/506.htm

ndaidong commented 1 year ago

@MJRT By default, this lib tries to get the main image from meta tags. If not found, it will leave blank. The transformation rules are only applied for article content, not main image. So you may need to do that with your own logic.