Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
79 stars 26 forks source link

Ingest clinical COVID-19 data from C3.ai's data lake (or their upstream source) #102

Open justaddcoffee opened 4 years ago

justaddcoffee commented 4 years ago

Name of the dataset ~Clinical data from C3.ai data lake, for example case (age/gender/location/symptoms/date of onset) from https://c3.ai/covid-19-api-documentation/~

Per Marcin's sharp eyes, we could just ingest clinical data upstream, from where C3.ai ingests it: https://github.com/Knowledge-Graph-Hub/kg-covid-19/issues/102

Also here: https://docs.google.com/spreadsheets/d/e/2PACX-1vQU0SIALScXx8VXDX7yKNKWWPKE1YjFlWc6VTEVSN45CklWWf-uWmprQIyLtoPDA18tX9cFDr-aQ9S6/pubhtml

Mapping or relevant fields TBD

If possible, highlight which fields map to nodes and which fields map to edges. Refer to Data Preparation for guidelines on how the final transformed data should be represented.

justaddcoffee commented 4 years ago

Sounds like @realmarcin will implement this, and Bill can review his mapping/transform

wdduncan commented 4 years ago

Added a documentation directory for the C3.ai API.
We can place other documentation there as needed.

cc @deepakunni3 @realmarcin

justaddcoffee commented 4 years ago

Note that if this ingest requires an API call, this ticket will then probably also require a function in download() to call the API and emit a file

wdduncan commented 4 years ago

see quickstart.ipynb for examples of how to access data

justaddcoffee commented 4 years ago

Discussion with Bill - initial thought is to ingest the following for each case:

age sex location symptoms therapeutic/clinical intervention outcome covid genome sequence

justaddcoffee commented 4 years ago

Per Marcin's sharp eyes, we could just ingest clinical data upstream, from where C3.ai ingests it: https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/latest_data/latestdata.csv

Also here: https://docs.google.com/spreadsheets/d/e/2PACX-1vQU0SIALScXx8VXDX7yKNKWWPKE1YjFlWc6VTEVSN45CklWWf-uWmprQIyLtoPDA18tX9cFDr-aQ9S6/pubhtml

wdduncan commented 4 years ago

@justaddcoffee @realmarcin
Where did you find these URLs for the data? I can't find them.

justaddcoffee commented 4 years ago

These two links seem to be the upstream source of some of C3.ai data - are they not working for you?

https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/latest_data/latestdata.csv https://docs.google.com/spreadsheets/d/e/2PACX-1vQU0SIALScXx8VXDX7yKNKWWPKE1YjFlWc6VTEVSN45CklWWf-uWmprQIyLtoPDA18tX9cFDr-aQ9S6/pubhtml

wdduncan commented 4 years ago

Yes. They work for me.
I was curious how you found them in the documenation.