dhimmel / stargeo

Generating expression signatures for disease using STARGEO
https://doi.org/10.15363/thinklab.d96
2 stars 1 forks source link

Meta-analyses to combine significant genes #1

Open inestm28 opened 1 year ago

inestm28 commented 1 year ago

I was looking into your code at https://github.com/dhimmel/stargeo/blob/master/combine.ipynb.

Could you tell me where the file 'balanced_permutation.tsv.gz' comes from /is downloaded from?

And just to make sure, this code uses only the p-values in the "multipletests" method, right?

So, it first gets all the significant genes from each independent study, then creates corrected p-values and after selects the genes that have "up" direction through the fold-change? I'm not sure how the process goes. I would really appreciate if you could help me, please.

And I read in this paper https://doi.org/10.1093/bib/bbaa019, that random effects models are used when the gene is significant in all studies. So it is not used for when it's not in some of the studies, right?

dhimmel commented 1 year ago

Hey @inestm28. Thanks for your questions. I don't remember much about this analysis, so the source code is probably the best reference for the methods. Here are some additional places to look.

Could you tell me where the file 'balanced_permutation.tsv.gz' comes from /is downloaded from?

The querier.ipynb notebook creates balanced_permutation.tsv.gz for each disease.

So, it first gets all the significant genes from each independent study, then creates corrected p-values and after selects the genes that have "up" direction through the fold-change?

The combine.ipynb notebook appears to use the random_pval_corrected and renames it to p_adjusted, which gets written to data/diffex.tsv.

Sorry if these answers don't address everything. Feel free to continue using this issue to leave your notes and conclusions for these questions.