OpenKBC / multiple_sclerosis_proj

Data analysis platform/structure, machine learning/AI project for multiple sclerosis
MIT License
11 stars 1 forks source link

Reporting initial result of data analysis #24

Open swiri021 opened 3 years ago

swiri021 commented 3 years ago

@kicheolkim It seems CIS and RR has DiseaseDuration difference significantly (notebook link), so if we find features that are relevant to early and late markers by using DiseaseDuration, those might be differential expressed genes between RR and CIS. Do you think CIS and RR have actual meaning for the early and late stages of MS? for example, CIS might be really the early stage of MS and RR for the medium stage.

kicheolkim commented 3 years ago

@swiri021 I don't think RR can consider as a middle of the stage. As far as I know, in clinics, CIS is just experienced the first time neurodegeneration. If the patient had an attack again, it considers as the RR stage. A patient can have more attacks or have no more attacks after the CIS stage. That's why I think the data is good for disease mechanisms study but may not be good for diagnostic biomarkers.

swiri021 commented 3 years ago
kicheolkim commented 3 years ago

wow... it's very interesting... Is the state single gene? or gene set? Is there a list of the genes?

swiri021 commented 3 years ago

wow... it's very interesting... Is the state single gene? or gene set? Is there a list of the genes?

Activation score is calculated by gene-sets(gene signatures), and DEG is one list of DEG as you know... So, here, DEG model is using one gene as one features and Activation Score model is using one pathway score as one feature.

swiri021 commented 3 years ago

wow... it's very interesting... Is the state single gene? or gene set? Is there a list of the genes?

I think this may occur because of features numbers, and I downed DEG fold change threshold to 0.58(1.5 fold) and performance is better than 2 fold threshold. But still, the activation score model has narrower discrepancies of AUC between validation-set and test-set. Anyway, I am going to dig pathway features deeply.

kicheolkim commented 3 years ago

So, the activation score is based on pathways, and the gene set is from DEG (DESeq2 results)? Do you have a list of pathway and/or gene sets? I'm curious what pathways/genes are included.

swiri021 commented 3 years ago

So, the activation score is based on pathways, and the gene set is from DEG (DESeq2 results)? Do you have a list of pathway and/or gene sets? I'm curious what pathways/genes are included.

Yes, we have a list of pathways, and the activation score was calculated by using MSigDB. Additionally, I will let you know if we have more interesting points here.

swiri021 commented 3 years ago

@kicheolkim @lacuss I got a weird signal in the data: Notebook link That signal is related to 'Sex' of patients, unfortunately, it is related to RR and CIS category significantly..... maybe another noise?

lacuss commented 3 years ago

@kicheolkim @lacuss I got a weird signal in the data: Notebook link That signal is related to 'Sex' of patients, unfortunately, it is related to RR and CIS category significantly..... maybe another noise?

So…I looked up about MS at Mayo clinic website. Seems that “Sex-Women are more than two to three times as likely as men are to have relapsing-remitting MS” Maybe this is the reason? Should dig up more about the correlation I think.

swiri021 commented 3 years ago

Yeah, I have seen a similar review paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3707353/ , but still, we can't say that RR and CIS could be related on sex factor because of the case number. Any thought? @kicheolkim do we need to go some further with gender information? Top pathways are here(for that clustering):

RUNNE_GENDER_EFFECT_UP PYEON_CANCER_HEAD_AND_NECK_VS_CERVICAL_DN GSE5099_DAY3_VS_DAY7_MCSF_TREATED_MACROPHAGE_DN GSE3982_MEMORY_CD4_TCELL_VS_BCELL_UP GOMF_HISTONE_DEMETHYLASE_ACTIVITY_H3_K4_SPECIFIC

@kicheolkim @lacuss I got a weird signal in the data: Notebook link That signal is related to 'Sex' of patients, unfortunately, it is related to RR and CIS category significantly..... maybe another noise?

So…I looked up about MS at Mayo clinic website. Seems that “Sex-Women are more than two to three times as likely as men are to have relapsing-remitting MS” Maybe this is the reason? Should dig up more about the correlation I think.

swiri021 commented 3 years ago
swiri021 commented 3 years ago

These genes are outliers to cluster male and femal. When these genes removed from the list, clustering by Sex has been completely gone. Let me know if these genes are interesting or need more investigation. (Sorry for not converting EntrezID)

EntrezID pval fc
6192 3.013055e-25 5.829828
8284 3.013055e-25 5.683377
5616 3.013055e-25 5.566646
8653 3.013055e-25 5.436171
8287 3.013055e-25 5.414928
246126 3.013055e-25 5.140104
7404 3.013055e-25 4.902533
7544 3.013055e-25 3.396781
9086 3.013055e-25 2.624308
9087 3.013055e-25 1.204376
lacuss commented 3 years ago
input name symbol alias (first 5) HGNC
8653 DEAD-box helicase 3 Y-linked DDX3Y DBY HGNC:2699
9086 eukaryotic translation initiation factor 1A Y-linked EIF1AY eIF-4C HGNC:3252
8284 lysine demethylase 5D KDM5D HY, HYA, JARID1D, SMCY HGNC:11115
5616 protein kinase Y-linked (pseudogene) PRKY PRKXP3, PRKYP HGNC:9444
6192 ribosomal protein S4 Y-linked 1 RPS4Y1 RPS4Y, S4 HGNC:10425
9087 thymosin beta 4 Y-linked TMSB4Y TB4Y HGNC:11882
246126 taxilin gamma pseudogene, Y-linked TXLNGY CYorf15A, CYorf15B, TXLNG2P HGNC:18473
8287 ubiquitin specific peptidase 9 Y-linked USP9Y DFFRY, SPGFY2 HGNC:12633
7404 ubiquitously transcribed tetratricopeptide repeat containing, Y-linked UTY KDM6AL, KDM6C, UTY1 HGNC:12638
7544 zinc finger protein Y-linked ZFY ZNF911 HGNC:1
swiri021 commented 3 years ago

As we expected, all of genes are Y-linked

kicheolkim commented 3 years ago

Sorry for the late reply. I was busy becoming a father this week :) MS is more prevalent in women, and immune cells are strongly affected by gender. I used gender and age as a covariate in my analysis.

swiri021 commented 2 years ago

https://soobarkbar.tistory.com/30