different number of immune.ics depending on runs

UrszulaCzerwinska / DeconICA

Deconvolution of transcriptome through Immune Component Analysis

https://urszulaczerwinska.github.io/DeconICA/

GNU General Public License v2.0

8 stars 5 forks source link

different number of immune.ics depending on runs #3

Closed UrszulaCzerwinska closed 6 years ago

UrszulaCzerwinska commented 6 years ago

Sometimes we get different number of immune component candidates (stroma ones don’t always pass the threshold) - possibility: not take them into account for deconvolution

UrszulaCzerwinska commented 6 years ago

This problem can be related to threshold or number of genes we include in enrichment. I have remarked "interpretation" difference for some datasets depending on how many "top" genes from ICs were included for enrichment. Sort of sensitivity study is necessary

UrszulaCzerwinska commented 6 years ago

So I have run enrichment for BRCA_TCGA data selecting for >0.98 quantile by 0.001 (0.1%) so varying from 420 to 21 genes by 21 genes The results didn't change at all for Myeloid cell but it definitely changed for T cells /B cells So what I observed is that for lower thresholds (more genes) T cells are the most enriched till quantile 0.995 and then it switches to B cells with very high probability 8/12 genes So, what I see is that top driving genes are B cell genes (some B cell highly specific)

UrszulaCzerwinska commented 6 years ago

For 3 decompositions of CIT

interpretation for 4 components T cell and Myeloid stable and 2 that change a) from T cells to NK and then back T cells b) from B cells to stroma to T cells to NK cells/T cells depending how many genes are used for enrichment test
interpretation of 3 components T cell, Myeloid and Stroma the IC4 didn't passe enrichment because of p.value correction for any number of genes
interpretation of 4 components T cell and Myeloid stables and a) T cell to NK / B cell and b) Stroma / Myeloid the lowest IC5 did not pass the threshold for enrichment

Maybe we should add some kind of stabilization i.e. decompose several times and keep only stable ones???

UrszulaCzerwinska commented 6 years ago

I increased reproducibility by maxitand decreased tolthat increased the repoductibility. We can also cheat and set the seed fixed inside run_fastica

UrszulaCzerwinska commented 6 years ago

I have then always 4 components that pass correlation threshold. However, the interpretation slightly differs (less than before). I will try to decrease toland increase maxiteven more corr_immune_runs

We can see that the lowest correlation between runs concerns 4th component, however, even little change cause consequences in enrichment test

UrszulaCzerwinska commented 6 years ago

So this can be mainly settled by ICASSO stabilization. It is working efficiently in MATLAB. This is why in unofficial version of the package will be possibility to call matlab ica with icasso I also used Biton MineICA::clusterFastICARuns()function, however, I had problems with MineICA installation as it depends on too many packages... therefore I copied and adapted the funcition. Testing now how slow it is ...

UrszulaCzerwinska commented 6 years ago

Opening another issue for the enrichment test

UrszulaCzerwinska commented 6 years ago

res.test.2 <- run_fastica(METABRIC.cen, optimal = TRUE, row.center = TRUE, with.names = FALSE, alg.typ = "parallel", gene.names = row.names(METABRIC.cen), method = "C", n.comp = 100, isLog = TRUE, R = TRUE, stabilize = TRUE, funClus = "hclust", methodClust = "average", nbIt = 100)

Time difference of 5.040406 hours

UrszulaCzerwinska commented 6 years ago

this stabilisation results are really different from Matlab ones and from what we can expect

UrszulaCzerwinska commented 6 years ago

we can see that the results of matlab and r icasso is not the same as far as partitions are concerned. The weird fact is that R seems to overcluster the stable components which false the results... parititonr metabric cen_numerical txt_100_stability

UrszulaCzerwinska commented 6 years ago

I tried to figure it out. It looks like once I give the distance matrix to R code it works fine; but when I put the distance matrix from R to Matlab it works fine too. I also tried a different R implementation but it didn't work well either in practice. I call it a day, we will recommend to use MATLAB or Docker