Open monk1337 opened 7 months ago
I can do it but i will need list of websites from which to fetch the data. Like if there's a blogging site then whenever we will run our scrapper so new blogs will be added to unstructured data.
How about we build this scraper in parts, like someone takes the tourism part, someone takes the hospitals part, and later on, we can combine them to make a fully automated raw data scraper?
How about we build this scraper in parts, like someone takes the tourism part, someone takes the hospitals part, and later on, we can combine them to make a fully automated raw data scraper?
That would be nice, but we will still need list of sites ( that regularly update data on specific topic ) to target them for latest data.
Or we can have another folder called scrapped in Unstrcured_data folder and we can scrap any data related to lucknow by our program, ( can be in different files that are named based on date or something else ) in it.
@pratyakshSoni1 @AayushSharma-1 That's a great idea to take care of one topic and build the scrapper step by step. @AayushSharma-1 you can go through the old PRs of this repo, those who are contributing the unstructured data, are also mentioning the source of websites/links in the PR description, we can use those websites to scrape.
Yes, Sure!
Right now the unstructured data folder contains limited data, we need scrappers to scrape the data from different Lucknow websites so that if we want to add more data in the future or update the database of the Lucknow we can simply run those scrappers agents.