bioinformatics-core-shared-training / Bulk_RNAseq_Course_Base

https://bioinformatics-core-shared-training.github.io/Bulk_RNAseq_Course_Base/
12 stars 11 forks source link

GSEA: update to mouse halmark genes #18

Open tavareshugo opened 1 year ago

tavareshugo commented 1 year ago

In Sep 2022 GSEA released a new Mouse MSigDB (v2022.1.Mm) list of genes. This doesn't seem to be incorporated in the msigdbr package, which still uses orthology-based lists (this may change before the next course).

Need to check if they issue an update to this.

tavareshugo commented 7 months ago

Materials have now been updated to use https://bioconductor.org/packages/release/data/experiment/html/msigdb.html

For caching purposes we had to run:

library(msigdb)
msigdb.mm <- getMsigdb(org = 'mm', id = 'EZID', version = '2023.1')

Document this in the setup.md and then close the issue

AshKernow commented 7 months ago

Actually, this not quite resolved as the msigdb package is just using the same ortholog method not the Broad's mouse gene sets:

The mouse MSigDB has been created in collaboration with Gordon K. Smyth and Alex Garnham from WEHI. The code they use to generate the mouse MSigDB has been used in this package. Detailed description of the steps conducted to convert human gene expression signatures to mouse can be found at http://bioinf.wehi.edu.au/MSigDB. Mouse homologs for human genes were obtained using the HCOP database (as of 18/03/2021).

On the other hand we are only using the Hallmark set and even on the Broad website these are just orthologs:

MH - mouse-ortholog hallmark gene sets are versions of gene sets in the MSigDB Hallmarks collection mapped to their mouse orthologs.

We could download and parse the gmt files from the GSEA website, but this seems awfully clunky.