globaldothealth / list

Repository for Global.health: a data science initiative to enable rapid sharing of trusted and open public health data to advance the response to infectious diseases.
MIT License
39 stars 7 forks source link

Import Thai case data #516

Closed iamleeg closed 3 years ago

iamleeg commented 4 years ago

daily dump: https://data.go.th/en/dataset/covid-19-daily API: https://covid19.th-stat.com/th/api

z023 commented 4 years ago

Location.name is in English but location.admins are not being translated.

attwad commented 4 years ago

I just fixed that an hour ago, please drop Thai data and reimport it will have English names.

z023 commented 4 years ago

There are still Thailand location.admins not being translated.

attwad commented 4 years ago

After dropping and reimporting data, all admins seem to be translated (I've looked at 10 random pages).

z023 commented 4 years ago

There are 2 cases which are the same case ID duplicated (source case ID 3219), coming through with an Austria location. The original source states their nationality is Egyptian and possibly the positive detection location is unknown?

attwad commented 3 years ago

Those are the 2 cases: https://curator.ghdsi.org/cases/view/5f61cbe70142a772cba2df0f https://curator.ghdsi.org/cases/view/5f61faf00142a772cba59d79

the raw entry is:

{"ConfirmDate":"2020-07-13 00:00:00","No":"3219","Age":43,"Gender":"\u0e0a\u0e32\u0e22","GenderEn":"Male","Nation":"Egypt","NationEn":null,"Province":"\u0e44\u0e21\u0e48\u0e1e\u0e1a\u0e02\u0e49\u0e2d\u0e21\u0e39\u0e25","ProvinceId":78,"District":"\u0e40\u0e21\u0e37\u0e2d\u0e07","ProvinceEn":"Unknown","Detail":null,"StatQuarantine":1}

pretty printed:

Age: 43
ConfirmDate: "2020-07-13 00:00:00"
Detail: null
District: "เมือง"
Gender: "ชาย"
GenderEn: "Male"
Nation: "Egypt"
NationEn: null
No: "3219"
Province: "ไม่พบข้อมูล"
ProvinceEn: "Unknown"
ProvinceId: 78
StatQuarantine: 1

The geolocation query that gets done with this case is "เมือง, Unknown, Thailand"

and that "Unknown" seems to confuse the hell out of mapbox which gives us a randon location each time... image

I will have to filter out those "Unknowns" and use the Thai province name which was available instead.

attwad commented 3 years ago

I've deleted the two cases, they will be reimported correctly on next ingestion run.