extractus / article-extractor

To extract main article from given URL with Node.js
https://extractor-demos.pages.dev/article-extractor
MIT License
1.61k stars 140 forks source link

Possibility to extract category/or classification from a url? #162

Closed coturiv closed 3 years ago

coturiv commented 3 years ago

Hi @ndaidong , thanks for this awesome package. Do you have a plan to extract category or classification from an article, such as politics, sports, technology, etc?

This is an example, and I can see them in the metadata. https://www.foxnews.com/politics/biden-in-easter-message-calls-getting-vaccine-a-moral-obligation

image

Thank you.

ndaidong commented 3 years ago

@coturiv I like to be able to get more userful information from the target urls. However, almost of the properties such as what you shared aren't standard yet. There is very few websites provide these values for us.

If you really need to classify the articles, I recommend a better approach: let's make use a little of AI. My good friends are natural and brain.js. They are both lightweight, simple, easy to apply to your program.