NIH-NCPI / FHIR-CAT-June22

Technical interop pilot using RNASeq data
1 stars 3 forks source link

[KF] Identify FHIR DocumentReferences for Gene Expression Summary files #7

Open RobertJCarroll opened 2 years ago

RobertJCarroll commented 2 years ago

Save JSON containing all DocumentReferences for KF Gene Expression Summary files into this bucket: https://console.cloud.google.com/storage/browser/fc-be286b9f-3acf-4168-af6e-592df509391d/DocumentReference gs://fc-be286b9f-3acf-4168-af6e-592df509391d/DocumentReference

RobertJCarroll commented 2 years ago

This query grabs the relevant files: https://kf-api-fhir-service.kidsfirstdrc.org/DocumentReference?type:text=Gene%20Expression&security-label=U

The lack of a vocabulary means it might not be capturing everything, though. There are some Gene Expression Quantification results also, but they look to be restricted access only.

liberaliscomputing commented 2 years ago

Here is the number of breakdowns by study of the above resources:

ianfore commented 2 years ago

For Kids First Study - (PBTA-PNOC ResearchStudy/48656 SD_8Y99QZJJ Pediatric Brain Tumor Atlas: PNOC For a single example patient Patient/48592 there are 61 files Accessible file count by type {'tbi': 5, 'maf': 5, 'vcf': 5} Inaccessible file count by type {'tbi': 12, 'vcf': 11, 'maf': 10, 'bam': 7, 'cram': 2, 'crai': 2, 'bai': 1, 'gvcf': 1})

RobertJCarroll commented 2 years ago

I believe rsem.genes.results.gz files are the files we need for this.

liberaliscomputing commented 2 years ago

Use these:

RobertJCarroll commented 2 years ago

EG: https://kf-api-fhir-service.kidsfirstdrc.org/DocumentReference/378682