Open answerquest opened 6 years ago
The website gives you an option to export the data in PDFs. Do we want to export it in that format or use the data on the webpage to create a CSV or JSON file?
@Dhanesh95 we want to get the numbers out in a way that they can be combined across time. I'd pitch for scraping into JSON as the data will likely get hierarchical when we combine it across different dates.
Note: the data seems to be available for each wednesday only. And possibly some dates data may not be available. So the scraper will need to be able to handle that.
@answerquest I was thinking on the same lines. Generating a JSON file becomes highly convenient as it can be converted into any other data format we want. I also noted that the data is available for each Wednesday and I'm confident I can build a scraper for this use case. Do you mind if I get started on this right away? Maybe we can finish it off at the hackathon.
@Dhanesh95 sorry just seeing this now, on the day of the Hackathon :laughing:
A basic scraper for the website is ready which can generate a CSV file of all the data from the website. You can find the code on this link - https://git.fosscommunity.in/Dhanesh95/pmjdyScraper
@answerquest The work here is not 100% complete and I have a few ideas in mind that I'd like to implement. Please assign this issue to me.
@Dhanesh95 ok done
Suggestion: can visualize using Highcharts: https://www.highcharts.com/stock/demo
Datameet group thread: https://groups.google.com/forum/#!searchin/datameet/pdfs%7Csort:date/datameet/ErNY82gA7dw/mmBUxH5DAgAJ
Site: https://www.pmjdy.gov.in/archive