aatishb / covidtrends

Tracking the growth of COVID-19 Cases worldwide
https://aatishb.com/covidtrends/
MIT License
301 stars 107 forks source link

Include Brazilian breakdown per state #92

Open felipequintella opened 4 years ago

felipequintella commented 4 years ago

Brazil is a big country with many different hotspots. Including a breakdown per state (as per US, Canada, Australia) would help visualize what's happening. This should be fairly easy from the Health Ministry data (https://covid.saude.gov.br/)

rpkoller commented 4 years ago

problem is they don't provide a direct download link, at least not on the linked page. if you click the download csv file there you get a generated temporal dl link. :/

felipequintella commented 4 years ago

I noticed that too... I've actually been trying to scrape their website for that CSV link for the past week, and I think I finally managed. Or until they change something again... Scraping and final data is here: https://github.com/felipequintella/covid19-brazil-scraper https://raw.githubusercontent.com/felipequintella/covid19-brazil-scraper/master/brazil.csv

I've also forked covidtrends and included the breakdown there as well. If you think it is worth it, let me know and I can try a pull request. https://github.com/felipequintella/covidtrends Final product here: https://covid19.felipequintella.com/

Edit: of course, scraping their data also means the final data may be considered not as official, accurate and reliable as one might want for the project. Let me know what you guys think ;) "We don’t want to become a repository of many datasets, as it’s difficult for us to vouch for their accuracy and reliability."

rpkoller commented 4 years ago

hm basically at the end it is @aatishb choice how to handle things. on one hand its cool that you provide the opportunity with scraping the existing data from the government page. but imho i would be careful never the less about any further step for data aggregation for any country. but i've searched on google for "covid saude.gov.br csv" which lead me in here: https://brasil.io/dataset/covid19/boletim/ You found that too already?

i dont speak, i guess it is portuguese in brazil, so unsure if i understand everything correctly. i've only utilized translate.google.com a little. but Leia a documentação dessa tabela lead to the following repo here: https://github.com/turicas/covid19-br/blob/master/api.md#boletim . suppose from there a query string might be crafted for their api? i guess the complete download there https://data.brasil.io/dataset/covid19/caso.csv.gz would be a little bit too extensive ;))) goes down even to the city level data wise it looks. the csv is 2,2mb in size. ;))) might be easier for a native speaker to find his or her way around there.

rpkoller commented 4 years ago

But tried the examples on the Github repo in Paw and I got a 301 for querying https://brasil.io/api/dataset/covid19/caso/data?is_last=True&state=AL :/

rpkoller commented 4 years ago

guess the caso.csv.gz is the smallest but most complete at the same time available. it is also listed here alongside other versions, all provided sha512 checksums: https://data.brasil.io/dataset/covid19/_meta/list.html

@felipequintella the only thing i have issues understanding just with google translate. is that dataset aggregated by brazilian offcials/government employees? meaning is that a official data source or is that aggregation based on voluntary work?

felipequintella commented 4 years ago

Hey @rpkoller , I'll take a look at these and see how they are collecting/compiling and verifying the data. I had not found it before, looks good though! I'll revert later.

stees commented 4 years ago

@rpkoller as per this link, they say all the data comes from each state health department, so yes, I would say it's aggregated by the authorities and brasil.io just compiles them in one big csv. This is their report on that.

Filtering the caso_full.csv by place_type == state would yield the result Felipe is talking about.