ML Analysis MC4R - Githubissues

frehburg commented 1 year ago

phenotype:

overweight < obesity class I < class II < class III
obesity normal = refuted phenotype
growth abnomarily = not recorded (i.e. not represented in PPs)

genotype:

interpretations -> genomic interpretations -> variantInterpretation -> variantDescriptor -> expressions ->
- hgvs strings
- allelic state -> GENO string

frehburg commented 1 year ago

Genotyping:

Required manual validation and standardization. ERKER form allowed mutations as free text or validated HGVS expressions. Phenopackets required validated HGVS expressions. Mutations and zygosity of MC4R-deficiency patients captured in an Excel sheet. Mutations were manually standardized and validated according to HGVS nomenclature. Varsome used to deduce possible mutations as HGVS expressions on genomic level (c.HGVS) using the reference genome hg38. ClinVar was used to verify the mutations proposed by Varsome. Data for MC4R-deficiency genotyping was organized into a CSV file with one patient per row, including mutations, zygosity, OMIM codes, and clinical significance. Manual pre-processing excluded certain mutations from analysis. A total of 105 mutations from 98 patients were identified, with some excluded due to various reasons. Data was imported into the REDCap project as a single CSV file, processed in a Jupyter notebook, and mapped according to the ERKER genetic findings. Phenotyping:

Required Human Phenotype Ontology (HPO) annotated standardization for data transformation. Date of determination and status of each phenotype were necessary for deep phenotyping. The ERKER form allowed the status of each phenotype to be captured as confirmed, refuted, or not recorded. Disease-specific phenotyping involved categorizing obesity into overweight and class I-III obesity. The World Health Organization (WHO) BMI to Age tables were converted to DataFrames for male and female patients. MC4R-deficiency weight data were transferred to a CSV file and converted to a phenotyping DataFrame. Date of determination was determined for each phenotype by adding age in months to the date of birth using Pandas DateTime functions. Differentiation between confirmed, refuted, and not recorded phenotypes was based on HPO terms. A Python function was developed to annotate BMI data to the respective HPO terms. Obesity class, date, and status for each examination were determined, and new columns were created in the phenotyping data. The phenotyping data was organized and written to the ERKER_v1.7 CSV file. Future pre-processing steps were defined to convert data to FHIR (Fast Healthcare Interoperability Resources) using the MII-KDS core information model and ERKERonFHIR project.

frehburg commented 1 year ago

Questions:

How probable is it per genotype to reach class 1,2,3 obesity with each genotype (vor dem xten lebensjahr)

How long does it take to reach the classes.

how likely is it to be diagnosed with obesity at all with each genotype

also do tests in reverse

frehburg commented 11 months ago

Descriptively, the number of (single) heterozygous and homozygous patients and its distribution of cHGVS mutations.
The correlation between class 1-3 obesity until the preventive check up of U9 and c.HGVS mutations comparing the severity of obesity per mutation over time.
The influence of the mutations and zygosity on the severity over time was displayed, determining the most severe zygosity and mutation.
The combination of the most severe zygosity and mutation was analysed.

frehburg commented 11 months ago

New df rows =mutations Percentage zygosity Percentage reach each obesity stage Avg time to get there

BIH-CEI / ERKER2Phenopackets

ML Analysis MC4R #185