Open iamleeg opened 4 years ago
Will be a bit tricky to get updated links because they are stored in google drive but amazing information (>250k records as of Sept 24).
Example here: https://drive.google.com/file/d/1GmaZJaN9Ly5Ew-xa3DVpg8vh-P8hq-OL/view?usp=sharing
Interesting, I have no idea how the source can be setup for that because URLs can't easily be guessed (docs have random uuid).
Seems like they already parse that data from somewhere as they include a "validation status" column that contains stuff like Age or Birthdate is Invalid
or "Health Status is ""Recovered"", but no Date Recovered is recorded Removal Type is ""Recovered"", but no Recovered Date is recorded"
, parsing those will be tricky if we want to but I don't think we want, maybe just storing them as notes is fine.
Also they seem to remove cases as they are found to be dupes:
September 22, 2020
[Case Information]
The DOH-Epidemiology Bureau is currently increasing efforts to identify more potential duplicate
records in the system.
As such, there were thirty-four (34) previous records found to be duplicates of existing cases and
have since been removed from the system.
[Case Information]
There were two (2) previous records found to be suspected but not confirmed COVID patients.
They have since been removed from the system.
that happens almost every day, so we better ingest that source with a few days lag to account for the fact that we don't retroactively delete cases on our own.
http://bit.ly/DataDropArchives