Open RobertJCarroll opened 2 years ago
# get the data
df_include =query_fhir_include(query_statement)
df_kf = query_fhir_kf(query_statement)
df_gtex = query_fhir_gtex(query_statement)
# dataframe looks like
# columns:
# document_reference_attachment_uri (either drs:// or gs://),
# drs_uri, (if it exists)
# document_reference_reference, ( DocumentReference/1234 )
# file_path, (downloaded document_reference_attachment_uri on local file system)
# specimen_bodySite,
# condition_code,
# research_study_reference, (full uri of research_study https:/example.com/fhir/ResearchStudy/1234)
# patient_reference, (full uri https:/example.com/fhir/Patient/123)
# specimen_reference, (full uri https:/example.com/fhir/Specimen/123)
# ... extra columns (eg. observations) allowed
# index:
# document_reference_reference
pca_df = df_include + df_kf + df_gtex
# go do PCA!
@RobertJCarroll plan ^
To get disease status associated with HTP in INCLUDE:
https://include-api-fhir-service.includedcc.org/Observation?_tag=HTP&code=MONDO:0008608
For Controls and
https://include-api-fhir-service.includedcc.org/Condition?code=MONDO:0008608
For Cases
That MONDO code is Down Syndrome. The other studies may be done differently, then Meen can clarify how those. His may all be Conditions with a different verificationStatus
gtext_v8 data frame here: gs://fc-be286b9f-3acf-4168-af6e-592df509391d/gtex_v8-dataframe.tsv
For each assay result, we want to gather the following:
If the information is missing (eg, no sample links in KF), the data can be left as NA. We will post-process to generate as detailed of labels as possible.