GoogleCloudPlatform / covid-19-open-data

Datasets of daily time-series data related to COVID-19 for over 20,000 distinct locations around the world.
Apache License 2.0
471 stars 131 forks source link

Update PlaceIDs for knowledge_graph.csv from new Knowledge Graph lookup, adding PlaceIDs for regions that previously had no match, but only changing if the new PlaceID has a matching Mobility Report but the old one does not, or if the new PlaceID is determined better by manual inspection. #529

Closed geening closed 2 years ago

geening commented 2 years ago

This results in some keys mapping to different Mobility Reports, and sometimes loss if new PlaceID has no Mobility Report for a given day (for 2021-11-30, PE_UCA was the only key for which this change removes the Mobility Report).

geening commented 2 years ago

Not sure why, this reports 26171 added and 26171 deleted lines, even though most appear the same. I wonder if the new file has different end-of-line characters.

geening commented 2 years ago

Yeah, I think I'm changing the 0x0d0a (carriage return, line feed) at the end of each line to just 0x0a (line feed). Let me know if that's a problem.

geening commented 2 years ago

Oops, that last commit seemed to have turned the file into a binary file. I should probably undo it....

geening commented 2 years ago

@owahltinez: Let's hold off on merging this for now until the other discussion gets resolved -- we may want to keep all old PlaceIDs that have Mobility Reports.

owahltinez commented 2 years ago

This is amazing, thanks Matt!

owahltinez commented 2 years ago

One minor error that needs to be corrected, from the test logs: AssertionError: 'SL-N-BM' not greater than 'SL-NW-PL' : Keys in knowledge_graph.csv must follow lexicographical order: SL-N-BM ≤ SL-NW-PL (all keys must be lexicographically sorted)

geening commented 2 years ago

One minor error that needs to be corrected, from the test logs: AssertionError: 'SL-N-BM' not greater than 'SL-NW-PL' : Keys in knowledge_graph.csv must follow lexicographical order: SL-N-BM ≤ SL-NW-PL (all keys must be lexicographically sorted)

Done. But requiring lexicographical sorting of what the keys become after underscores are converted to dashes is annoying to maintain, especially if we continue to generate this file through a SQL query that returns results ordered by key.