Scrape pages to get title

fake-news-detector / api

API for saving news flagging by the users

https://fake-news-detector-api.herokuapp.com/

8 stars 1 forks source link

Scrape pages to get title #9

Closed MrDOS closed 6 years ago

MrDOS commented 6 years ago

Right now, a page title must be passed to the /votes endpoint along with the URL or else there's no data to pass through to Robinho. When the page title isn't passed, we should scrape it from the given URL.

This sort of scraping is also useful for #2.

rogeriochaves commented 6 years ago

thanks for opening this issue @MrDOS!

I agree with that, maybe even add a job that looks for incomplete data in the database and scrap the url to fill it.

I just like to stress that scrapping is complementary, not the main way to get the news title, because it is easier to get the right title line from facebook or twitter feed, due to their consistent html, than from random websites

rogeriochaves commented 6 years ago

now users can also flag news from the website, where we don't get the news title, so this issue is more important now

rogeriochaves commented 6 years ago

This issue was moved to fake-news-detector/fake-news-detector#16