Cowin-team / Cowin_data_scrapper

1 stars 3 forks source link

automate the TN scrapper codes to ping the website at regular intervals and give push requests to the API #2

Closed nmahesh1412 closed 3 years ago

krish5989 commented 3 years ago

@nmahesh1412 whats the website url that needs to be scrapped?

nmahesh1412 commented 3 years ago

https://tncovidbeds.tnega.org/ https://stopcorona.tn.gov.in/beds.php

nmahesh1412 commented 3 years ago

@vipin8169 @krish5989 for the tnega website for some districts there are >100 hospitals. so the key value pair of "pageLimit": 100 will miss a few hospitals. Can you look into how you can fix that.

Also did one of you test the districmapping hex codes? That didnt work for me when I pass any of the hexcodes it only returns the first 100 hospital entries in that site not hospitals from specific cities. can you check? Looks like the website underwent a major revamp. Not sure if they changed stuff in that

krish5989 commented 3 years ago

@nmahesh1412 i have update tnega to fetch 500 pages. Also added the api call logic. please check.

krish5989 commented 3 years ago

@vipin8169 @krish5989 for the tnega website for some districts there are >100 hospitals. so the key value pair of "pageLimit": 100 will miss a few hospitals. Can you look into how you can fix that.

Also did one of you test the districmapping hex codes? That didnt work for me when I pass any of the hexcodes it only returns the first 100 hospital entries in that site not hospitals from specific cities. can you check? Looks like the website underwent a major revamp. Not sure if they changed stuff in that

I just noticed that hexcode & response are not in sync. So i have changed logic to get Sheet name from the response => used District Name to populate the sheet names. I tried setting the pagelimit to infinite but it didn't help. May be through trial and error we can increase the page limit to scrape as many hospitals possible. for now set it to 500.