aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

Perturbation: calculating shift along PCAs #291

Open mason-sweat1 opened 5 months ago

mason-sweat1 commented 5 months ago

Describe the bug Dear Seppe,

After running steps 21, 23 and 33 in the perturbation simulation, I am unable to successfully run [38].

Here is an example of the error message I receive:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[58], line 12
     10 sys.stderr = open(os.devnull, "w")  # silence stderr
     11 for TF in TFs_of_interest:
---> 12         delta_embedding = _project_perturbation_in_embedding(
     13                 scplus_obj,
     14                 original_matrix = scplus_obj.uns[f'{TF}_KD_sim_eRegulon_AUC_iter_0']['Gene_based'],
     15                 perturbed_matrix = scplus_obj.uns[f'{TF}_KD_sim_eRegulon_AUC_iter_4']['Gene_based'],
     16                 reduction_name = f'{TF}_KD_sim_eRegulon_PCA_iter_0')
     17         mean_shift = pd.DataFrame(delta_embedding).groupby(scplus_obj.metadata_cell['celltype'].to_numpy()).mean()
     18         shifts_PC0[TF] = mean_shift[0]

File /lab-share/Cardio-Pu-e2/Public/ch220446_Mason2/conda/lib/python3.9/site-packages/scenicplus/simulation.py:249, in _project_perturbation_in_embedding(scplus_obj, original_matrix, perturbed_matrix, reduction_name, sigma_corr, n_cpu)
    241 def _project_perturbation_in_embedding(
    242     scplus_obj, 
    243     original_matrix, 
   (...)
    246     sigma_corr = 0.05, n_cpu = 1):
    247     #based on celloracle/velocyto code
    248     if reduction_name not in scplus_obj.dr_cell.keys():
--> 249         raise ValueError(f'Embbeding "{reduction_name}" not found!')
    250     if original_matrix is None:
    251         original_matrix = scplus_obj.to_df('EXP').copy().to_numpy().astype('double')

ValueError: Embbeding "Tead1_KD_sim_eRegulon_PCA_iter_0" not found!

I am confused by this, I do have the following in scplus_obj.uns.keys()

dict_keys(['Cistromes', 'search_space', 'region_to_gene', 'TF2G_adj', 'eRegulons', 'eRegulon_metadata', 'eRegulon_signatures', 'eRegulon_AUC', 'Pseudobulk', 'TF_cistrome_correlation', 'eRegulon_AUC_thresholds', 'eRegulon_metadata_filtered', 'eRegulon_signatures_filtered', 'eRegulon_AUC_filtered', 'selected_eRegulon', 'DARs', 'DEGs', 'RSS', 'Nr3c2_KD_sim_eRegulon_AUC_iter_0', 'Nr3c2_KD_sim_eRegulon_AUC_iter_1', 'Nr3c2_KD_sim_eRegulon_AUC_iter_2', 'Nr3c2_KD_sim_eRegulon_AUC_iter_3', 'Nr3c2_KD_sim_eRegulon_AUC_iter_4', 'Rarb_KD_sim_eRegulon_AUC_iter_0', 'Rarb_KD_sim_eRegulon_AUC_iter_1', 'Rarb_KD_sim_eRegulon_AUC_iter_2', 'Rarb_KD_sim_eRegulon_AUC_iter_3', 'Rarb_KD_sim_eRegulon_AUC_iter_4', 'Mecom_KD_sim_eRegulon_AUC_iter_0', 'Mecom_KD_sim_eRegulon_AUC_iter_1', 'Mecom_KD_sim_eRegulon_AUC_iter_2', 'Mecom_KD_sim_eRegulon_AUC_iter_3', 'Mecom_KD_sim_eRegulon_AUC_iter_4', 'Peg3_KD_sim_eRegulon_AUC_iter_0', 'Peg3_KD_sim_eRegulon_AUC_iter_1', 'Peg3_KD_sim_eRegulon_AUC_iter_2', 'Peg3_KD_sim_eRegulon_AUC_iter_3', 'Peg3_KD_sim_eRegulon_AUC_iter_4', 'Sox17_KD_sim_eRegulon_AUC_iter_0', 'Sox17_KD_sim_eRegulon_AUC_iter_1', 'Sox17_KD_sim_eRegulon_AUC_iter_2', 'Sox17_KD_sim_eRegulon_AUC_iter_3', 'Sox17_KD_sim_eRegulon_AUC_iter_4', 'Bnc2_KD_sim_eRegulon_AUC_iter_0', 'Bnc2_KD_sim_eRegulon_AUC_iter_1', 'Bnc2_KD_sim_eRegulon_AUC_iter_2', 'Bnc2_KD_sim_eRegulon_AUC_iter_3', 'Bnc2_KD_sim_eRegulon_AUC_iter_4', 'Thrb_KD_sim_eRegulon_AUC_iter_0', 'Thrb_KD_sim_eRegulon_AUC_iter_1', 'Thrb_KD_sim_eRegulon_AUC_iter_2', 'Thrb_KD_sim_eRegulon_AUC_iter_3', 'Thrb_KD_sim_eRegulon_AUC_iter_4', 'Esrrb_KD_sim_eRegulon_AUC_iter_0', 'Esrrb_KD_sim_eRegulon_AUC_iter_1', 'Esrrb_KD_sim_eRegulon_AUC_iter_2', 'Esrrb_KD_sim_eRegulon_AUC_iter_3', 'Esrrb_KD_sim_eRegulon_AUC_iter_4', 'Spi1_KD_sim_eRegulon_AUC_iter_0', 'Spi1_KD_sim_eRegulon_AUC_iter_1', 'Spi1_KD_sim_eRegulon_AUC_iter_2', 'Spi1_KD_sim_eRegulon_AUC_iter_3', 'Spi1_KD_sim_eRegulon_AUC_iter_4', 'Pdlim5_KD_sim_eRegulon_AUC_iter_0', 'Pdlim5_KD_sim_eRegulon_AUC_iter_1', 'Pdlim5_KD_sim_eRegulon_AUC_iter_2', 'Pdlim5_KD_sim_eRegulon_AUC_iter_3', 'Pdlim5_KD_sim_eRegulon_AUC_iter_4', 'Atf3_KD_sim_eRegulon_AUC_iter_0', 'Atf3_KD_sim_eRegulon_AUC_iter_1', 'Atf3_KD_sim_eRegulon_AUC_iter_2', 'Atf3_KD_sim_eRegulon_AUC_iter_3', 'Atf3_KD_sim_eRegulon_AUC_iter_4', 'Zfp445_KD_sim_eRegulon_AUC_iter_0', 'Zfp445_KD_sim_eRegulon_AUC_iter_1', 'Zfp445_KD_sim_eRegulon_AUC_iter_2', 'Zfp445_KD_sim_eRegulon_AUC_iter_3', 'Zfp445_KD_sim_eRegulon_AUC_iter_4', 'Atf6_KD_sim_eRegulon_AUC_iter_0', 'Atf6_KD_sim_eRegulon_AUC_iter_1', 'Atf6_KD_sim_eRegulon_AUC_iter_2', 'Atf6_KD_sim_eRegulon_AUC_iter_3', 'Atf6_KD_sim_eRegulon_AUC_iter_4', 'Maf_KD_sim_eRegulon_AUC_iter_0', 'Maf_KD_sim_eRegulon_AUC_iter_1', 'Maf_KD_sim_eRegulon_AUC_iter_2', 'Maf_KD_sim_eRegulon_AUC_iter_3', 'Maf_KD_sim_eRegulon_AUC_iter_4', 'Gata4_KD_sim_eRegulon_AUC_iter_0', 'Gata4_KD_sim_eRegulon_AUC_iter_1', 'Gata4_KD_sim_eRegulon_AUC_iter_2', 'Gata4_KD_sim_eRegulon_AUC_iter_3', 'Gata4_KD_sim_eRegulon_AUC_iter_4', 'Esrra_KD_sim_eRegulon_AUC_iter_0', 'Esrra_KD_sim_eRegulon_AUC_iter_1', 'Esrra_KD_sim_eRegulon_AUC_iter_2', 'Esrra_KD_sim_eRegulon_AUC_iter_3', 'Esrra_KD_sim_eRegulon_AUC_iter_4', 'Esrrg_KD_sim_eRegulon_AUC_iter_0', 'Esrrg_KD_sim_eRegulon_AUC_iter_1', 'Esrrg_KD_sim_eRegulon_AUC_iter_2', 'Esrrg_KD_sim_eRegulon_AUC_iter_3', 'Esrrg_KD_sim_eRegulon_AUC_iter_4', 'Pir_KD_sim_eRegulon_AUC_iter_0', 'Pir_KD_sim_eRegulon_AUC_iter_1', 'Pir_KD_sim_eRegulon_AUC_iter_2', 'Pir_KD_sim_eRegulon_AUC_iter_3', 'Pir_KD_sim_eRegulon_AUC_iter_4', 'Ebf2_KD_sim_eRegulon_AUC_iter_0', 'Ebf2_KD_sim_eRegulon_AUC_iter_1', 'Ebf2_KD_sim_eRegulon_AUC_iter_2', 'Ebf2_KD_sim_eRegulon_AUC_iter_3', 'Ebf2_KD_sim_eRegulon_AUC_iter_4', 'Nfyc_KD_sim_eRegulon_AUC_iter_0', 'Nfyc_KD_sim_eRegulon_AUC_iter_1', 'Nfyc_KD_sim_eRegulon_AUC_iter_2', 'Nfyc_KD_sim_eRegulon_AUC_iter_3', 'Nfyc_KD_sim_eRegulon_AUC_iter_4', 'Nfib_KD_sim_eRegulon_AUC_iter_0', 'Nfib_KD_sim_eRegulon_AUC_iter_1', 'Nfib_KD_sim_eRegulon_AUC_iter_2', 'Nfib_KD_sim_eRegulon_AUC_iter_3', 'Nfib_KD_sim_eRegulon_AUC_iter_4', 'Mitf_KD_sim_eRegulon_AUC_iter_0', 'Mitf_KD_sim_eRegulon_AUC_iter_1', 'Mitf_KD_sim_eRegulon_AUC_iter_2', 'Mitf_KD_sim_eRegulon_AUC_iter_3', 'Mitf_KD_sim_eRegulon_AUC_iter_4', 'Tcf7l1_KD_sim_eRegulon_AUC_iter_0', 'Tcf7l1_KD_sim_eRegulon_AUC_iter_1', 'Tcf7l1_KD_sim_eRegulon_AUC_iter_2', 'Tcf7l1_KD_sim_eRegulon_AUC_iter_3', 'Tcf7l1_KD_sim_eRegulon_AUC_iter_4', 'Mlxipl_KD_sim_eRegulon_AUC_iter_0', 'Mlxipl_KD_sim_eRegulon_AUC_iter_1', 'Mlxipl_KD_sim_eRegulon_AUC_iter_2', 'Mlxipl_KD_sim_eRegulon_AUC_iter_3', 'Mlxipl_KD_sim_eRegulon_AUC_iter_4', 'Nfia_KD_sim_eRegulon_AUC_iter_0', 'Nfia_KD_sim_eRegulon_AUC_iter_1', 'Nfia_KD_sim_eRegulon_AUC_iter_2', 'Nfia_KD_sim_eRegulon_AUC_iter_3', 'Nfia_KD_sim_eRegulon_AUC_iter_4', 'Tead1_KD_sim_eRegulon_AUC_iter_0', 'Tead1_KD_sim_eRegulon_AUC_iter_1', 'Tead1_KD_sim_eRegulon_AUC_iter_2', 'Tead1_KD_sim_eRegulon_AUC_iter_3', 'Tead1_KD_sim_eRegulon_AUC_iter_4', 'Zfp579_KD_sim_eRegulon_AUC_iter_0', 'Zfp579_KD_sim_eRegulon_AUC_iter_1', 'Zfp579_KD_sim_eRegulon_AUC_iter_2', 'Zfp579_KD_sim_eRegulon_AUC_iter_3', 'Zfp579_KD_sim_eRegulon_AUC_iter_4', 'Gata6_KD_sim_eRegulon_AUC_iter_0', 'Gata6_KD_sim_eRegulon_AUC_iter_1', 'Gata6_KD_sim_eRegulon_AUC_iter_2', 'Gata6_KD_sim_eRegulon_AUC_iter_3', 'Gata6_KD_sim_eRegulon_AUC_iter_4', 'Ebf1_KD_sim_eRegulon_AUC_iter_0', 'Ebf1_KD_sim_eRegulon_AUC_iter_1', 'Ebf1_KD_sim_eRegulon_AUC_iter_2', 'Ebf1_KD_sim_eRegulon_AUC_iter_3', 'Ebf1_KD_sim_eRegulon_AUC_iter_4', 'Zbtb20_KD_sim_eRegulon_AUC_iter_0', 'Zbtb20_KD_sim_eRegulon_AUC_iter_1', 'Zbtb20_KD_sim_eRegulon_AUC_iter_2', 'Zbtb20_KD_sim_eRegulon_AUC_iter_3', 'Zbtb20_KD_sim_eRegulon_AUC_iter_4', 'Ppara_KD_sim_eRegulon_AUC_iter_0', 'Ppara_KD_sim_eRegulon_AUC_iter_1', 'Ppara_KD_sim_eRegulon_AUC_iter_2', 'Ppara_KD_sim_eRegulon_AUC_iter_3', 'Ppara_KD_sim_eRegulon_AUC_iter_4', 'Mbnl2_KD_sim_eRegulon_AUC_iter_0', 'Mbnl2_KD_sim_eRegulon_AUC_iter_1', 'Mbnl2_KD_sim_eRegulon_AUC_iter_2', 'Mbnl2_KD_sim_eRegulon_AUC_iter_3', 'Mbnl2_KD_sim_eRegulon_AUC_iter_4', 'Wt1_KD_sim_eRegulon_AUC_iter_0', 'Wt1_KD_sim_eRegulon_AUC_iter_1', 'Wt1_KD_sim_eRegulon_AUC_iter_2', 'Wt1_KD_sim_eRegulon_AUC_iter_3', 'Wt1_KD_sim_eRegulon_AUC_iter_4', 'Ikzf1_KD_sim_eRegulon_AUC_iter_0', 'Ikzf1_KD_sim_eRegulon_AUC_iter_1', 'Ikzf1_KD_sim_eRegulon_AUC_iter_2', 'Ikzf1_KD_sim_eRegulon_AUC_iter_3', 'Ikzf1_KD_sim_eRegulon_AUC_iter_4', 'Pbx1_KD_sim_eRegulon_AUC_iter_0', 'Pbx1_KD_sim_eRegulon_AUC_iter_1', 'Pbx1_KD_sim_eRegulon_AUC_iter_2', 'Pbx1_KD_sim_eRegulon_AUC_iter_3', 'Pbx1_KD_sim_eRegulon_AUC_iter_4', 'Rora_KD_sim_eRegulon_AUC_iter_0', 'Rora_KD_sim_eRegulon_AUC_iter_1', 'Rora_KD_sim_eRegulon_AUC_iter_2', 'Rora_KD_sim_eRegulon_AUC_iter_3', 'Rora_KD_sim_eRegulon_AUC_iter_4', 'Zfp366_KD_sim_eRegulon_AUC_iter_0', 'Zfp366_KD_sim_eRegulon_AUC_iter_1', 'Zfp366_KD_sim_eRegulon_AUC_iter_2', 'Zfp366_KD_sim_eRegulon_AUC_iter_3', 'Zfp366_KD_sim_eRegulon_AUC_iter_4', 'Sox6_KD_sim_eRegulon_AUC_iter_0', 'Sox6_KD_sim_eRegulon_AUC_iter_1', 'Sox6_KD_sim_eRegulon_AUC_iter_2', 'Sox6_KD_sim_eRegulon_AUC_iter_3', 'Sox6_KD_sim_eRegulon_AUC_iter_4', 'Tead4_KD_sim_eRegulon_AUC_iter_0', 'Tead4_KD_sim_eRegulon_AUC_iter_1', 'Tead4_KD_sim_eRegulon_AUC_iter_2', 'Tead4_KD_sim_eRegulon_AUC_iter_3', 'Tead4_KD_sim_eRegulon_AUC_iter_4', 'Creb3l2_KD_sim_eRegulon_AUC_iter_0', 'Creb3l2_KD_sim_eRegulon_AUC_iter_1', 'Creb3l2_KD_sim_eRegulon_AUC_iter_2', 'Creb3l2_KD_sim_eRegulon_AUC_iter_3', 'Creb3l2_KD_sim_eRegulon_AUC_iter_4', 'Zfp612_KD_sim_eRegulon_AUC_iter_0', 'Zfp612_KD_sim_eRegulon_AUC_iter_1', 'Zfp612_KD_sim_eRegulon_AUC_iter_2', 'Zfp612_KD_sim_eRegulon_AUC_iter_3', 'Zfp612_KD_sim_eRegulon_AUC_iter_4', 'Zeb1_KD_sim_eRegulon_AUC_iter_0', 'Zeb1_KD_sim_eRegulon_AUC_iter_1', 'Zeb1_KD_sim_eRegulon_AUC_iter_2', 'Zeb1_KD_sim_eRegulon_AUC_iter_3', 'Zeb1_KD_sim_eRegulon_AUC_iter_4', 'Fli1_KD_sim_eRegulon_AUC_iter_0', 'Fli1_KD_sim_eRegulon_AUC_iter_1', 'Fli1_KD_sim_eRegulon_AUC_iter_2', 'Fli1_KD_sim_eRegulon_AUC_iter_3', 'Fli1_KD_sim_eRegulon_AUC_iter_4', 'Gli3_KD_sim_eRegulon_AUC_iter_0', 'Gli3_KD_sim_eRegulon_AUC_iter_1', 'Gli3_KD_sim_eRegulon_AUC_iter_2', 'Gli3_KD_sim_eRegulon_AUC_iter_3', 'Gli3_KD_sim_eRegulon_AUC_iter_4', 'Fosb_KD_sim_eRegulon_AUC_iter_0', 'Fosb_KD_sim_eRegulon_AUC_iter_1', 'Fosb_KD_sim_eRegulon_AUC_iter_2', 'Fosb_KD_sim_eRegulon_AUC_iter_3', 'Fosb_KD_sim_eRegulon_AUC_iter_4', 'Klf12_KD_sim_eRegulon_AUC_iter_0', 'Klf12_KD_sim_eRegulon_AUC_iter_1', 'Klf12_KD_sim_eRegulon_AUC_iter_2', 'Klf12_KD_sim_eRegulon_AUC_iter_3', 'Klf12_KD_sim_eRegulon_AUC_iter_4', 'Klf15_KD_sim_eRegulon_AUC_iter_0', 'Klf15_KD_sim_eRegulon_AUC_iter_1', 'Klf15_KD_sim_eRegulon_AUC_iter_2', 'Klf15_KD_sim_eRegulon_AUC_iter_3', 'Klf15_KD_sim_eRegulon_AUC_iter_4', 'Erg_KD_sim_eRegulon_AUC_iter_0', 'Erg_KD_sim_eRegulon_AUC_iter_1', 'Erg_KD_sim_eRegulon_AUC_iter_2', 'Erg_KD_sim_eRegulon_AUC_iter_3', 'Erg_KD_sim_eRegulon_AUC_iter_4', 'Runx1_KD_sim_eRegulon_AUC_iter_0', 'Runx1_KD_sim_eRegulon_AUC_iter_1', 'Runx1_KD_sim_eRegulon_AUC_iter_2', 'Runx1_KD_sim_eRegulon_AUC_iter_3', 'Runx1_KD_sim_eRegulon_AUC_iter_4', 'Elk3_KD_sim_eRegulon_AUC_iter_0', 'Elk3_KD_sim_eRegulon_AUC_iter_1', 'Elk3_KD_sim_eRegulon_AUC_iter_2', 'Elk3_KD_sim_eRegulon_AUC_iter_3', 'Elk3_KD_sim_eRegulon_AUC_iter_4', 'Tbx5_KD_sim_eRegulon_AUC_iter_0', 'Tbx5_KD_sim_eRegulon_AUC_iter_1', 'Tbx5_KD_sim_eRegulon_AUC_iter_2', 'Tbx5_KD_sim_eRegulon_AUC_iter_3', 'Tbx5_KD_sim_eRegulon_AUC_iter_4', 'Max_KD_sim_eRegulon_AUC_iter_0', 'Max_KD_sim_eRegulon_AUC_iter_1', 'Max_KD_sim_eRegulon_AUC_iter_2', 'Max_KD_sim_eRegulon_AUC_iter_3', 'Max_KD_sim_eRegulon_AUC_iter_4'])

Although as the error suggests, I do not appear to have the embedding:

scplus_obj.dr_cell.keys()

dict_keys(['GEX_X_pca', 'GEX_X_scVI', 'GEX_X_umap', 'GEX__scvi_extra_categorical_covs', 'GEX__scvi_extra_continuous_covs', 'ACC_UMAP', 'eRegulons_UMAP', 'eRegulons_tSNE', 'eRegulons_PCA_gene_based'])

Any help would be greatly appreciated!

Version (please complete the following information):

Additional context One thing to note, I ran the previous step on a cluster and then saved the object, loaded it, and tried to run the next step.

Thanks, Mason

SeppeDeWinter commented 5 months ago

Hi @mason-sweat1

I guess this is not well explained in the tutorial.

You first have to calculate this embedding (does not have to be PCA btw), this could also be the same PCA for all of the TFs (for example you could use GEX_X_pca, instead of {TF}_KD_sim_eRegulon_PCA_iter_0).

To calculate the PCA use the following function: run_eRegulons_pca.

I hope this helps?

All the best,

Seppe