fake-news-detector / api

API for saving news flagging by the users
https://fake-news-detector-api.herokuapp.com/
8 stars 1 forks source link

Scrape pages to get news body #2

Closed rogeriochaves closed 6 years ago

rogeriochaves commented 6 years ago

Currently we only save the news title and url to the Links table when insert a news, and also we only send the title for predictions.

For click baits, the title is what matters the most, but for fake news just a title is not enough, we could benefit a lot from having the whole body of the news.

It is important to develop this feature with caution, only get the body text, not html, putting short timeouts for reading the url, etc, because this can put a lot of weight in the server and our database.

Also, we should check if the link is already saved on the database so we don't need to rescrap it.

Maybe later we can think in other perfomance improvements for this.

I don't know a way to scrapping pages in rust, I need help

rogeriochaves commented 6 years ago

This was solved by #14, although it will be good to revisit on the future to check if the scrapping is working fine for most links