AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Proposed Analysis: Add CNS_region mapping in histologies file to OpenPBTA analyses #838

Closed jharenza closed 3 years ago

jharenza commented 3 years ago

What are the scientific goals of the analysis?

Add CNS_region to pbta-histologies.tsv using logic utilized on CHOP D3b end

What methods do you plan to use to accomplish the scientific goals?

Regions: Hemispheric, Midline, Spine, Ventricles, Posterior fossa, Optic pathway, Suprasellar, Other

Utilize this table to perform CNS_region mapping: Updated CNS Region Site primary_site
Hemispheric Frontal Frontal Lobe
Temporal Temporal Lobe
Parietal Parietal Lobe
Occipital Occipital Lobe
Right Occipital
Left temporo-parietal
Right Parietal
Right Frontal
Right Temporal
Midline Brainstem Brainstem
Midbrain Brain Stem- Midbrain/Tectum
Pons Brain Stem- Pons
Medulla Brain Stem-Medulla
Corpus Collosum
Thalamus Thalamus
Hypothalamus
Basal Ganglia Basal Ganglia
Hippocampus Hippocampus
Pineal Pineal Gland
Tectum Left Thalmus
Right Thalmus
Bilateral Thalmus
Spine Cervical Spinal Cord- Cervical
Thoracic Spinal Cord- Thoracic
Lumbar Spinal Cord- Lumbar/Thecal Sac
Sacral
Spine
Spine NOS
Ventricles Lateral
Third
Fourth
Cerebral aqueduct
Foramen of Monro
Ventricles
Posterior fossa Vermis
Peduncle
Cerebellum
Cerebellum Cerebellum/Posterior Fossa
Optic pathway Optic nerves
Optic chiasm
Optic tract
Optic radiation
Optic Pathway
Suprasellar Pituitary Suprasellar/Hypothalamic/Pituitary
Pituitary stalk
Other Meninges Meninges/Dura
Multifocal
Other locations NOS
Skull
Cranial Nerves NOS
Brain
Right Occipital - Parietal Lobes and Thalamus

Reference notion doc from @baileyckelly

What input data are required for this analysis?

pbta-histologies.tsv

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

0.5 day

Who will complete the analysis (please add a GitHub handle here if relevant)?

@kgaonkar6

What relevant scientific literature relates to this analysis?

NA, informed by physicians: Angela Waanders and Cassie Kline

kgaonkar6 commented 3 years ago

Is this still a to-do ticket or the base histology already takes care of this? Base histology file seems to already have the "Updated CNS Region" terms assigned.

kgaonkar6 commented 3 years ago

The direct matches in the ticket wouldn't satisfy where primary_site values can be assigned to multiple Updated_CNS_regions, these are all "Other" in base histology.

Example list of multiple primary_site values: primary_site

  | Cerebellum/Posterior Fossa;Temporal Lobe;Thalamus   | Optic Pathway;Temporal Lobe   | Optic Pathway;Suprasellar/Hypothalamic/Pituitary;Temporal Lobe   | Cerebellum/Posterior Fossa;Spinal Cord- Thoracic   | Basal Ganglia;Optic Pathway;Temporal Lobe;Ventricles   | Parietal Lobe;Ventricles   | Skull;Temporal Lobe   | Cerebellum/Posterior Fossa;Spinal Cord- Cervical;Spinal Cord- Lumbar/Thecal Sac;Spinal Cord- Thoracic   | Cerebellum/Posterior Fossa;Frontal Lobe   | Brain Stem- Pons;Cerebellum/Posterior Fossa

For these, let's annotate as "Mixed", knowing they need pathology review from Cassie for a primary site region designation.

jharenza commented 3 years ago

Is this still a to-do ticket or the base histology already takes care of this? Base histology file seems to already have the "Updated CNS Region" terms assigned.

Converting all of the extraction of data using DBT will mean that CNS_region will no longer be generated in the base histology file from the D3b end. However, we can add this, as you suggest, to our base histology preparation following QC.

jharenza commented 3 years ago

Closing, as this is now a part of the intermediate D3b workflow prior to release of pbta-histologies-base.tsv per this comment.

jharenza commented 3 years ago

@kgaonkar6 I am going to re-open this, as I think the logic was not captured entirely as anticipated for #849. For instance, some samples which should be mixed are annotated as midline or hemispheric. We can update this in #v19.

jharenza commented 3 years ago

One other thing to note is that for our "Mixed" samples, @jainpayal022 and Cassie Kline will be manually reviewing pathology and imaging reports of about 70 or so HGAT samples to assess the primary site of origin. We will then add these manually curated values for those CNS_regions. See this ticket.

kgaonkar6 commented 3 years ago

Adding the code here for CNS_region matching here https://github.com/d3b-center/D3b-codes/blob/master/OpenPBTA_v19_release_QC/QC_histology_v19.Rmd to record

kgaonkar6 commented 3 years ago

Just wanted to add the primary site changes from the latest pulls, so guide updates to the above primary_site ~ CNS_region matches

Kids_First_Biospecimen_ID primary_site_latest primary_site_previous
BS_1Q524P3B L. Pons Anterior Pons/Brainstem
BS_22VCR7DF L. Lateral Pons Pons/Brainstem
BS_5968GBGT R. Posterior Pons; Adjacent #6 Pons/Brainstem
BS_AF5D41PD L. Frontal Periventricular White Matter; Adjacent #3 Pons/Brainstem
BS_AK9BV52G Cerebellar White Matter Adjacent Necrosis + Medulla Pons/Brainstem
BS_D6STCMQS L. Anterior Medulla Pons/Brainstem
BS_EE73VE7V R. Inferior Pons Pons/Brainstem
BS_HYKV2TH9 R. Anterior Pons; Adjacent #7 Pons/Brainstem
BS_J8EH1N7V Inferior Pons Pons/Brainstem
BS_J8EK6RNF Brain Stem-Medulla;Brain Stem- Midbrain/Tectum;Brain Stem- Pons;Cerebellum/Posterior Fossa;Thalamus Pons/Brainstem
BS_X5VN0FW0 Inferior Medulla Pons/Brainstem
BS_Y74XAFJX Superior Pons Pons/Brainstem
BS_YHXMYDBN L. Pons Pons/Brainstem
kgaonkar6 commented 3 years ago

I'm not sure where exactly we will be using CNS_regions so just wondering if we have an updated dictionary matching primary_site to CNS_region to incorporate the new primary_sites assigned to PNOC autopsy samples? Currently the CNS_region is being assigned as NA since it doesn't match any terms in the issue description.

jharenza commented 3 years ago

Thank you for checking on this @kgaonkar6 - I had the answers from Cassie, but never posted here. See below with the only one in question being Cerebellar White Matter Adjacent Necrosis + Medulla, so for now, we can designate as Mixed, unless there is other data available to confirm Midline:

Kids_First_Biospecimen_ID primary_site_latest primary_site_previous CNS_region
BS_1Q524P3B L. Pons Anterior Pons/Brainstem Midline
BS_22VCR7DF L. Lateral Pons Pons/Brainstem Midline
BS_5968GBGT R. Posterior Pons; Adjacent #6 Pons/Brainstem Midline
BS_AF5D41PD L. Frontal Periventricular White Matter; Adjacent #3 Pons/Brainstem Mixed
BS_AK9BV52G Cerebellar White Matter Adjacent Necrosis + Medulla Pons/Brainstem Mixed
BS_D6STCMQS L. Anterior Medulla Pons/Brainstem Midline
BS_EE73VE7V R. Inferior Pons Pons/Brainstem Midline
BS_HYKV2TH9 R. Anterior Pons; Adjacent #7 Pons/Brainstem Midline
BS_J8EH1N7V Inferior Pons Pons/Brainstem Midline
BS_J8EK6RNF Brain Stem-Medulla;Brain Stem- Midbrain/Tectum;Brain Stem- Pons;Cerebellum/Posterior Fossa;Thalamus Pons/Brainstem Midline
BS_X5VN0FW0 Inferior Medulla Pons/Brainstem Midline
BS_Y74XAFJX Superior Pons Pons/Brainstem Midline
BS_YHXMYDBN L. Pons Pons/Brainstem Midline
kgaonkar6 commented 3 years ago

Thank you for the update!

jharenza commented 3 years ago

Confirmed this will remain Mixed

Cerebellar White Matter Adjacent Necrosis + Medulla, so for now, we can designate as Mixed

kgaonkar6 commented 3 years ago

Previously the first step of CNS_region assignment is "Mixed" is there are multiple values separated by ";" and then check if all individual values are part of on CNS_region. So the primary_site like R. Posterior Pons; Adjacent #6,R. Anterior Pons; Adjacent #7 and Brain Stem-Medulla;Brain Stem- Midbrain/Tectum;Brain Stem- Pons;Cerebellum/Posterior Fossa;Thalamus will be assigned Mixed.

https://github.com/d3b-center/D3b-codes/blob/20210212-release/OpenPBTA_v20_release_QC/code/util/primary_site_matched_CNS_region.R do we want to update the code or make these exeptions since "Adjacent" is not a brain location per se?

jharenza commented 3 years ago

To be honest, I was thinking about this and think that whoever ingests this into the BRP should match these regions with CBTN terms so we don't always have to update the terms. Let me check in with Jenn Mason and Shannon Robbins.

In the meantime. I was envisioning these going into the JSON file for Midline.

kgaonkar6 commented 3 years ago

Just need a confirmation

BS_J8EK6RNF seems to have primary_site "Pons" in the latest histology file pulls ( since 20210126-data release ) instead of the "Brain Stem-Medulla;Brain Stem- Midbrain/Tectum;Brain Stem- Pons;Cerebellum/Posterior Fossa;Thalamus" in the table above do we need to confirm this @jharenza ?

jharenza commented 3 years ago

Just need a confirmation

BS_J8EK6RNF seems to have primary_site "Pons" in the latest histology file pulls ( since 20210126-data release ) instead of the "Brain Stem-Medulla;Brain Stem- Midbrain/Tectum;Brain Stem- Pons;Cerebellum/Posterior Fossa;Thalamus" in the table above do we need to confirm this @jharenza ?

Let me check on this. Are the other samples matching primary_site_latest?

kgaonkar6 commented 3 years ago

yes

jharenza commented 3 years ago

Ok, it looks like Brain Stem-Medulla;Brain Stem- Midbrain/Tectum;Brain Stem- Pons;Cerebellum/Posterior Fossa;Thalamus was never a value in @jainpayal022's excel sheet for kids first import, but rather, Pons, so waiting for her to confirm this longer string was a mistake along the way somehow.

jharenza commented 3 years ago

confirmed that this value should be Pons

jharenza commented 3 years ago

we added this upstream during QC