dpeerlab / spectra

Supervised Pathway DEConvolution of InTerpretable Gene ProgRAms
MIT License
134 stars 17 forks source link

Error when running spc.est_spectra #3

Closed france-hub closed 1 year ago

france-hub commented 1 year ago

Hello!

Congrats and thanks for your work! A few days ago I read your group's tweet and thought SPECTRA is exactly what I need for my data. I am trying to disentangle CD8 activation/TCR reactivity vs exhaustion. I apologize in advance for the naive question, I am an R user.

I used Seurat to analyze my data, then saved the single cell object as h5ad file. Here is my script:

import os
import pandas as pd
from spectra import spectra as spc
import scanpy

os.getcwd()
adata = scanpy.read_h5ad("CD8_ad.h5ad")
pal_ident = ["#F0E442","#E69F00", "#009E73", "#56B4E9" ] #custom palette for clusters
adata.uns['clusters']= pal_ident
scanpy.pp.highly_variable_genes(adata)

then following (don't know if I followed properly) your README file I did:

gene_set_annotations = {"global": {"global_ifn_II_response" : ["CIITA", "CXCL10"] , "global_MHCI": ["HLA-A", "HLA-B", "HLA-C"] },
                        "CD8_T": {"CD8_tex": ['TOX', 'LAG3', 'PDCD1', 'HAVCR2', 'EOMES', 'ID2', 'CD244', 'NR4A3', 'NR4A2', 'NR4A1',
                                              'CXCL13', 'IRF4','PRDM1'],
                                  "CD8_reac": ['MKI67', 'TNFRSF9', 'TNFRSF18', 'GZMA', 'IFNG', 'ENTPD1', 'ITGAE', 'CXCL13']}
}
model = spc.est_spectra(adata = adata,  gene_set_dictionary = gene_set_annotations, cell_type_key = "clusters", use_highly_variable = True, lam = 0.01)

At first I had not included "global" (since I am only interested on CD8), but I was getting KeyError: 'global'

Including the "global" variable the function runs till ~30% and then throws this error:

ValueError: operands could not be broadcast together with shapes (15486,9) (1,6) 

I think I understand the type of error but not how to fix it. Hope you can help!

Best, Francesco

russellkune commented 1 year ago

Hi Francesco,

We're aware of this issue and working on a fix. I believe it is caused by a cell type appearing in the input dictionary (either L or the gene set dictionary) that doesn't appear in the AnnData

france-hub commented 1 year ago

Ok, I am glad to hear that I have not done some dumb mistake. Would you mind write here once the issue is fixed? Thank you!

wallet-maker commented 1 year ago

Hi Francesco,

I agree with Russell. This is most likely caused by a mismatch between the dictionary and adata cell types which have to be identical. We have realized that we provided too little detail in our old tutorial and we have now expanded this. I think it might help you resolve your issue:

https://github.com/dpeerlab/spectra/blob/main/notebooks/example_notebook.ipynb

We are still debating whether to make the est_spectra correct a mismatched gene set annotation dictionary automatically. It may not be desirable because that would require est_spectra to append missing cell types and might run with other gene sets than the user thinks it does. E.g. If you make a typo in a cell type label (e.g. in adata 'CD8T' and in dictionary 'CD8-T') est_spectra would have to discard the gene sets under 'CD8-T' in your gene set annotation dictionary.

The tutorial now includes guidance how to vet the gene set annotation dictionary before running the method.

Let me know if that helps.

Thank you, Thomas

france-hub commented 1 year ago

Thank you so much! Your explanation solved my issue and also it seems that for my data spectra works better than the algorithms I was using to calculate the module scores!!

wallet-maker commented 1 year ago

Great, thank you for your feedback Francesco :)!