dictionary format when no cell type labels in adata

AdrienJolly commented 1 year ago

Could you please provide an example of gene_set dictionary for cases where 'use_cell_types = False'? I can't seem to find the right format. Thanks

wallet-maker commented 1 year ago

Hi, Thank you for using Spectra. Indeed, there was a bug when setting use_celltypes = False we have fixed this now and we have added an example in the tutorial (see cell number 12)

https://github.com/dpeerlab/spectra/blob/main/notebooks/example_notebook.ipynb

let me know if that helps.

Thank you, Thomas

dpcook commented 1 year ago

I can confirm that use_celltypes=False works fine with a dictionary with everything under a global heading

gene_set_annotations = {
    "global": {"G2M phase": g2m_genes,
               "S phase": s_genes,
               "Hallmark_Hypoxia": hypoxia_genes,
               "Hallmark_Inflammation": inflammatory_genes,
               "Hallmark_Il2_Stat5": il2_genes,
               "Hallmark_Il6_JAK_Stat3": il6_genes,
               "Hallmark_IFNg_response": ifng_genes,
               "Hallmark_TGFb_activity": tgfb_genes,
               "Hallmark_Apoptosis": apoptosis_genes,
               "Elyada_myCAF": myCAF,
               "Elyada_iCAF": iCAF
}

Though I'm trying to load a saved model with spc.load_from_pickle and it's not liking any option for the cell_type_key parameter. I'm not 100% positive they're related, but not sure what else it may be.

adata has spectra output:

CleanShot 2023-01-31 at 10 30 58

Got an error 1) omitting the parameter, 2) trying to set to None, 3) assigning an arbitrary cell type key not used in the spectra run, and 4) setting a column in obs to 'global' (below)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In [99], line 1
----> 1 model = spc.load_from_pickle('../output/iflc_disease_spectra', adata, 
      2                             gene_set_annotations, cell_type_key='celltype_level2')

File ~/mambaforge/envs/scvi-env/lib/python3.9/site-packages/spectra/spectra.py:1240, in load_from_pickle(fp, adata, gs_dict, cell_type_key)
   1239 def load_from_pickle(fp, adata, gs_dict, cell_type_key):
-> 1240     model = SPECTRA_Model(X = adata[:,adata.var["spectra_vocab"]].X, labels = np.array(adata.obs[cell_type_key]),  L = adata.uns["SPECTRA_L"], 
   1241                           vocab = adata.var_names[adata.var["spectra_vocab"]], gs_dict = gs_dict)
   1242     model.load(fp, labels = np.array(adata.obs[cell_type_key]))
   1243     return(model)

File ~/mambaforge/envs/scvi-env/lib/python3.9/site-packages/spectra/spectra.py:541, in SPECTRA_Model.__init__(self, X, labels, L, vocab, gs_dict, use_weights, adj_matrix, weights, lam, delta, kappa, rho, use_cell_types)
    537     else:
    538         adj_matrix, weights = spectra_util.process_gene_sets_no_celltypes(gs_dict = gs_dict, gene2id = gene2id, weighted = use_weights)
--> 541 self.internal_model = SPECTRA(X = X, labels = labels, adj_matrix = adj_matrix, L = L, weights = weights, lam = lam, delta=delta,kappa = kappa, rho = rho, use_cell_types = use_cell_types)
    543 self.cell_scores = None
    544 self.factors = None

File ~/mambaforge/envs/scvi-env/lib/python3.9/site-packages/spectra/spectra.py:210, in SPECTRA.__init__(self, X, labels, adj_matrix, L, weights, lam, delta, kappa, rho, use_cell_types)
    208     self.rho = nn.ParameterDict()
    209 #initialize global params
--> 210 self.theta["global"] = nn.Parameter(Normal(0.,1.).sample([self.p, self.L["global"]]))
    211 self.eta["global"] = nn.Parameter(Normal(0.,1.).sample([self.L["global"], self.L["global"]]))
    212 self.gene_scaling["global"] = nn.Parameter(Normal(0.,1.).sample([self.p]))

IndexError: invalid index to scalar variable.

The only thing I can think of is that spectra was run on a cluster and I'm now looking at the results locally. adata and model were saved in the run, but the `gs_dict' was remade locally and might be differing in some way. Not sure that could explain it but will explore

AdrienJolly commented 1 year ago

Thank you Thomas for the fix, It now works very well for me now and thank you for the great tool!

Adrien

wallet-maker commented 1 year ago

Hi dpcook,

sorry for the late reply. Yes, the spc.load_from_pickle function requires an identical gene set dictionary.

It is easier to just use pickle to save and load the model instead of the Spectra method.

We have corrected this in the tutorial which we now provide on Colab.

https://github.com/dpeerlab/spectra/blob/main/notebooks/Spectra_Colaboratory_tutorial.ipynb

Please let me know if that helps.

Thanks, Thomas

dpcook commented 1 year ago

This is great--thanks Thomas! Feel free to close the issue.

wallet-maker commented 1 year ago

Thank you :)

dpeerlab / spectra

dictionary format when no cell type labels in adata #7