dpeerlab / spectra

Supervised Pathway DEConvolution of InTerpretable Gene ProgRAms
MIT License
150 stars 17 forks source link

Annotation format requirement #5

Closed kvshams closed 1 year ago

kvshams commented 1 year ago

What annotation format is required? Is it possible to use the gene sets directly from the pathway database? for instances the C2 jason bundle from the Broad Institute pathway database? Thanks, Shams

wallet-maker commented 1 year ago

Hi Shams,

my apologies for the late response. Yes, you can use an entire pathway database like the C2 bundle from MSigDB. The important thing is you format the gene set annotation dictionary correctly.

The dictionary has to include all cell types from your adatacell type annotations as keys. Since most databases will not give you annotations which cell types their gene sets are specific to, you will have to 1) either annotate the cell types yourself or 2) set all gene sets as global (both approaches should be fine you can look empirically what works for you).

gene_set_dictionary = {'celltype_1':{'gene_set_1':['gene_a', 'gene_b', 'gene_c'], 'gene_set_2':['gene_c','gene_a','gene_e','gene_f']},

'celltype_2':{'gene_set_1':['gene_a', 'gene_b', 'gene_c'], 'gene_set_3':['gene_a', 'gene_e','gene_f','gene_d']},

'celltype_3':{},

'global':"{'gene_set_4':['gene_m','gene_n']}

Having said that, we believe that best results can be obtained by limiting the number of gene sets to coherent interpretable genes of similar size and with limited redundancy (please see the manuscript Supplementary Methods for further detail https://doi.org/10.1101/2022.12.20.521311 ). We also offer a package to select gene sets for Spectra which we will update with an extended set of annotations (including cancer cell and stroma cell gene sets) in the near future https://github.com/wallet-maker/cytopus .

Let me know if that helps

kvshams commented 1 year ago

Thanks for the reply. Is there an example code snippet format the jason file from MSiGDB? Thanks, Shams

wallet-maker commented 1 year ago

Hi Shams, we do not provide a code snippet, but you will find an explanation in the tutorial how to configure the dictionary. The easiest way would be to run this will use_celltype=False in the est_spectra function. We now provide an example in the tutorial.

https://github.com/dpeerlab/spectra/blob/main/notebooks/example_notebook.ipynb

Thank you, Thomas