aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

create_SCENICPLUS_object(multi_ome_mode = False),Keeping 0 cells for RNA and 0 for ATAC #313

Open wangdlpbr opened 3 months ago

wangdlpbr commented 3 months ago

Hello, when I run create_SCENICPLUS_object, I get the following error, how should I solve this problem

code:

> from scenicplus.scenicplus_class import create_SCENICPLUS_object
> import numpy as np
> scplus_obj = create_SCENICPLUS_object(
>         GEX_anndata = adata,
>         cisTopic_obj = cistopic_obj,
>         menr = menr,
>         multi_ome_mode = False,
>         key_to_group_by = 'EPIgroup',
>         nr_cells_per_metacells = 5)

2024-03-04 17:26:34,872 cisTopic     INFO     Imputing region accessibility
2024-03-04 17:26:34,873 cisTopic     INFO     Impute region accessibility for regions 0-20000
2024-03-04 17:26:34,896 cisTopic     INFO     Impute region accessibility for regions 20000-40000
2024-03-04 17:26:34,911 cisTopic     INFO     Impute region accessibility for regions 40000-60000
2024-03-04 17:26:34,926 cisTopic     INFO     Impute region accessibility for regions 60000-80000
2024-03-04 17:26:34,938 cisTopic     INFO     Impute region accessibility for regions 80000-100000
2024-03-04 17:26:34,942 cisTopic     INFO     Done!
2024-03-04 17:26:34,944 create scenicplus object INFO     Following annotations were found in both assays under key EPIgroup:
    .
Keeping 0 cells for RNA and 0 for ATAC.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[57], line 3
      1 from scenicplus.scenicplus_class import create_SCENICPLUS_object
      2 import numpy as np
----> 3 scplus_obj = create_SCENICPLUS_object(
      4         GEX_anndata = adata.raw.to_adata(),
      5         cisTopic_obj = cistopic_obj,
      6         menr = menr,
      7         multi_ome_mode = False,
      8         key_to_group_by = 'EPIgroup',
      9         nr_cells_per_metacells = 5)

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/scenicplus_class.py:623, in create_SCENICPLUS_object(GEX_anndata, cisTopic_obj, menr, multi_ome_mode, nr_metacells, nr_cells_per_metacells, meta_cell_split, key_to_group_by, imputed_acc_obj, imputed_acc_kwargs, normalize_imputed_acc, normalize_imputed_acc_kwargs, cell_metadata, region_metadata, gene_metadata, bc_transform_func, ACC_prefix, GEX_prefix)
    620 GEX_anndata = GEX_anndata[GEX_cells_to_keep]
    622 # generate metacells
--> 623 grouper_EXP = Groupby(GEX_anndata.obs[key_to_group_by].to_numpy())
    624 grouper_ACC = Groupby(
    625     cisTopic_obj.cell_data[key_to_group_by].to_numpy())
    627 assert all(grouper_EXP.keys ==
    628            grouper_ACC.keys), 'grouper_EXP.keys should be the same as grouper_ACC.keys'

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/utils.py:279, in Groupby.__init__(self, keys)
    277 def __init__(self, keys):
    278     self.keys, self.keys_as_int = np.unique(keys, return_inverse=True)
--> 279     self.n_keys = max(self.keys_as_int) + 1
    280     self.set_indices()

ValueError: max() arg is an empty sequence
SeppeDeWinter commented 3 months ago

Hi @wangdlpbr

It looks like no annotations are matching between your scATAC-seq and scRNA-seq data for EPIgroup. Could you show:


print(set(adata.obs["EPIgroup"]))
print(set(cistopic_obj.cell_data["EPIgroup"]))

All the best,

Seppe

wangdlpbr commented 3 months ago

Hi @wangdlpbr

It looks like no annotations are matching between your scATAC-seq and scRNA-seq data for EPIgroup. Could you show:

print(set(adata.obs["EPIgroup"]))
print(set(cistopic_obj.cell_data["EPIgroup"]))

All the best,

Seppe

Thanks for your help, I have solved this problem, but I have encountered a new one

from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
    sys.stderr = open(os.devnull, "w")  # silence stderr
    run_scenicplus(
        scplus_obj = scplus_obj,
        variable = ['EPIgroup'],
        species = 'hsapiens',
        assembly = 'hg38',
        tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
        save_path = os.path.join(work_dir, 'scenicplus'),
        biomart_host = biomart_host,
        upstream = [1000, 150000],
        downstream = [1000, 150000],
        calculate_TF_eGRN_correlation = True,
        calculate_DEGs_DARs = True,
        export_to_loom_file = True,
        export_to_UCSC_file = True,
        n_cpu = 12,
        _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
    sys.stderr = sys.__stderr__  # unsilence stderr
except Exception as e:
    #in case of failure, still save the object
    dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
    raise(e)
2024-03-16 18:20:26,470 SCENIC+_wrapper INFO     /data4/litianqing/wangda/sc_rna_atac_zx/scenicplus/EPIgroup/scenicplus folder already exists.
2024-03-16 18:20:26,472 SCENIC+_wrapper INFO     Getting search space
2024-03-16 18:20:38,399 R2G          INFO     Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2024-03-16 18:24:05,517 R2G          INFO     Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2024-03-16 18:24:17,354 R2G          INFO     Extending promoter annotation to 10 bp upstream and 10 downstream
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-03-16 18:24:19,335 R2G          INFO     Extending search space to:
                                    150000 bp downstream of the end of the gene.
                                    150000 bp upstream of the start of the gene.
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-03-16 18:24:30,233 R2G          INFO     Intersecting with regions.
Warning! Start and End columns now have different dtypes: int32 and int64
2024-03-16 18:24:30,688 R2G          INFO     Calculating distances from region to gene
2024-03-16 18:24:52,409 R2G          INFO     Imploding multiple entries per region and gene
2024-03-16 18:25:38,300 R2G          INFO     Done!
2024-03-16 18:25:38,443 SCENIC+_wrapper INFO     Inferring region to gene relationships
2024-03-16 18:25:38,489 R2G          INFO     Calculating region to gene importances, using GBM method
2024-03-16 18:31:22,346 R2G          INFO     Took 343.85478925704956 seconds
2024-03-16 18:31:22,348 R2G          INFO     Calculating region to gene correlation, using SR method
2024-03-16 18:36:52,587 R2G          INFO     Took 330.238156080246 seconds
2024-03-16 18:37:02,521 R2G          INFO     Done!
2024-03-16 18:37:02,671 SCENIC+_wrapper INFO     Inferring TF to gene relationships
2024-03-16 18:37:07,723 TF2G         INFO     Calculating TF to gene correlation, using GBM method
2024-03-16 19:01:57,943 TF2G         INFO     Took 1490.2188758850098 seconds
2024-03-16 19:01:57,952 TF2G         INFO     Adding correlation coefficients to adjacencies.
2024-03-16 19:02:52,181 TF2G         INFO     Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05
2024-03-16 19:03:11,854 TF2G         INFO     Adding importance x rho scores to adjacencies.
2024-03-16 19:03:11,908 TF2G         INFO     Took 73.95558142662048 seconds
2024-03-16 19:03:12,575 SCENIC+_wrapper INFO     Build eGRN
2024-03-16 19:03:12,577 GSEA         INFO     Thresholding region to gene relationships
2024-03-16 19:07:31,683 GSEA         INFO     Subsetting TF2G adjacencies for TF with motif.
2024-03-16 19:07:39,397 GSEA         INFO     Running GSEA...
2024-03-16 19:16:24,294 GSEA         INFO     Subsetting on adjusted pvalue: 1, minimal NES: 0 and minimal leading edge genes 10
2024-03-16 19:16:24,342 GSEA         INFO     Merging eRegulons
2024-03-16 19:16:24,344 GSEA         INFO     Storing eRegulons in .uns[eRegulons].
2024-03-16 19:16:26,445 SCENIC+_wrapper INFO     Formatting eGRNs
2024-03-16 19:16:27,712 SCENIC+_wrapper INFO     Converting eGRNs to signatures
2024-03-16 19:16:27,716 SCENIC+_wrapper INFO     Calculating eGRNs AUC
2024-03-16 19:16:27,717 SCENIC+_wrapper INFO     Calculating region ranking
2024-03-16 19:16:28,347 SCENIC+_wrapper INFO     Calculating eGRNs region based AUC
2024-03-16 19:16:31,637 SCENIC+_wrapper INFO     Calculating gene ranking
2024-03-16 19:16:31,885 SCENIC+_wrapper INFO     Calculating eGRNs gene based AUC
2024-03-16 19:16:35,442 SCENIC+_wrapper INFO     Calculating TF-eGRNs AUC correlation
2024-03-16 19:16:39,680 SCENIC+_wrapper INFO     Binarizing eGRNs AUC
2024-03-16 19:16:44,424 SCENIC+_wrapper INFO     Making eGRNs AUC UMAP
2024-03-16 19:16:52,163 SCENIC+_wrapper INFO     Making eGRNs AUC tSNE

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:221, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
    216     import fitsne
    217     embedding = fitsne.FItSNE(
    218         np.ascontiguousarray(
    219             data_mat.to_numpy()),
    220         rand_seed=random_state,
--> 221         perplexity=perplexity, **kwargs)
    222 except BaseException:

NameError: name 'perplexity' is not defined

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[37], line 24
     21 except Exception as e:
     22     #in case of failure, still save the object
     23     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 24     raise(e)

Cell In[37], line 4
      2 try:
      3     sys.stderr = open(os.devnull, "w")  # silence stderr
----> 4     run_scenicplus(
      5         scplus_obj = scplus_obj,
      6         variable = ['EPIgroup'],
      7         species = 'hsapiens',
      8         assembly = 'hg38',
      9         tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
     10         save_path = os.path.join(work_dir, 'scenicplus'),
     11         biomart_host = biomart_host,
     12         upstream = [1000, 150000],
     13         downstream = [1000, 150000],
     14         calculate_TF_eGRN_correlation = True,
     15         calculate_DEGs_DARs = True,
     16         export_to_loom_file = True,
     17         export_to_UCSC_file = True,
     18         n_cpu = 12,
     19         _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
     20     sys.stderr = sys.__stderr__  # unsilence stderr
     21 except Exception as e:
     22     #in case of failure, still save the object

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:294, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
    292 if 'eRegulons_tSNE' not in scplus_obj.dr_cell.keys():
    293     log.info('Making eGRNs AUC tSNE')
--> 294     run_eRegulons_tsne(scplus_obj,
    295                scale=True, signature_keys=['Gene_based', 'Region_based'])
    297 if 'RSS' not in scplus_obj.uns.keys():
    298     log.info('Calculating eRSS')

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:223, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
    217     embedding = fitsne.FItSNE(
    218         np.ascontiguousarray(
    219             data_mat.to_numpy()),
    220         rand_seed=random_state,
    221         perplexity=perplexity, **kwargs)
    222 except BaseException:
--> 223     embedding = sklearn.manifold.TSNE(
    224         n_components=2, random_state=random_state).fit_transform(
    225         data_mat.to_numpy(), **kwargs)
    226 dr = pd.DataFrame(
    227     embedding,
    228     index=data_names,
    229     columns=[
    230         'tSNE_1',
    231         'tSNE_2'])
    232 if not hasattr(scplus_obj, 'dr_cell'):

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/utils/_set_output.py:157, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    155 @wraps(f)
    156 def wrapped(self, X, *args, **kwargs):
--> 157     data_to_wrap = f(self, X, *args, **kwargs)
    158     if isinstance(data_to_wrap, tuple):
    159         # only wrap the first output for cross decomposition
    160         return_tuple = (
    161             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    162             *data_to_wrap[1:],
    163         )

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/base.py:1152, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1145     estimator._validate_params()
   1147 with config_context(
   1148     skip_parameter_validation=(
   1149         prefer_skip_nested_validation or global_skip_validation
   1150     )
   1151 ):
-> 1152     return fit_method(estimator, *args, **kwargs)

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:1110, in TSNE.fit_transform(self, X, y)
   1085 @_fit_context(
   1086     # TSNE.metric is not validated yet
   1087     prefer_skip_nested_validation=False
   1088 )
   1089 def fit_transform(self, X, y=None):
   1090     """Fit X into an embedded space and return that transformed output.
   1091 
   1092     Parameters
   (...)
   1108         Embedding of the training data in low-dimensional space.
   1109     """
-> 1110     self._check_params_vs_input(X)
   1111     embedding = self._fit(X)
   1112     self.embedding_ = embedding

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:821, in TSNE._check_params_vs_input(self, X)
    819 def _check_params_vs_input(self, X):
    820     if self.perplexity >= X.shape[0]:
--> 821         raise ValueError("perplexity must be less than n_samples")

ValueError: perplexity must be less than n_samples
SeppeDeWinter commented 3 months ago

Hi @wangdlpbr

How may cells do you have in your scplus_obj and how many regulons (scplus_obj.uns["eRegulon_AUC"]["gene_based"]).

Also, if you reach this part of the code the most essential steps of the workflow have been finished, so do make sure to save your scplus_obj at this point (no need to rerun everything).

All the best,

Seppe

wangdlpbr commented 2 months ago

Hi @SeppeDeWinter I didn't save the previous scplus_obj, so I regenerated it, but I found that it produced errors that weren't there before. In addition, aftercreate_SCENICPLUS_object: Keeping 185 cells for RNA and 63 for ATAC, after run_scenicplus: SCENIC+ object with n_cells x n_genes = 26 x 37263 and n_cells x n_regions = 26 x 84551

from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
    sys.stderr = open(os.devnull, "w")  # silence stderr
    run_scenicplus(
        scplus_obj = scplus_obj,
        variable = ['EPIgroup'],
        species = 'hsapiens',
        assembly = 'hg38',
        tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
        save_path = os.path.join(work_dir, 'scenicplus'),
        biomart_host = biomart_host,
        upstream = [1000, 150000],
        downstream = [1000, 150000],
        calculate_TF_eGRN_correlation = True,
        calculate_DEGs_DARs = True,
        export_to_loom_file = True,
        export_to_UCSC_file = True,
        n_cpu = 12,
        _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
    sys.stderr = sys.__stderr__  # unsilence stderr
except Exception as e:
    #in case of failure, still save the object
    dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
    raise(e)
2024-04-02 20:01:03,271 SCENIC+_wrapper INFO     /data4/litianqing/wangda/sc_rna_atac_zx/scenicplus/EPIgroup/scenicplus folder already exists.
2024-04-02 20:01:03,273 SCENIC+_wrapper INFO     Merging cistromes
2024-04-02 20:01:11,086 SCENIC+_wrapper INFO     Getting search space
2024-04-02 20:01:22,764 R2G          INFO     Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2024-04-02 20:02:58,815 R2G          INFO     Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2024-04-02 20:03:05,520 R2G          INFO     Extending promoter annotation to 10 bp upstream and 10 downstream
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-02 20:03:06,960 R2G          INFO     Extending search space to:
                                    150000 bp downstream of the end of the gene.
                                    150000 bp upstream of the start of the gene.
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-02 20:03:14,718 R2G          INFO     Intersecting with regions.
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-02 20:03:16,272 R2G          INFO     Calculating distances from region to gene
2024-04-02 20:03:48,253 R2G          INFO     Imploding multiple entries per region and gene
2024-04-02 20:04:36,303 R2G          INFO     Done!
2024-04-02 20:04:36,576 SCENIC+_wrapper INFO     Inferring region to gene relationships
2024-04-02 20:04:36,662 R2G          INFO     Calculating region to gene importances, using GBM method

2024-04-02 20:04:40,708 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-02 20:11:54,058 R2G          INFO     Took 437.39494943618774 seconds
2024-04-02 20:11:54,062 R2G          INFO     Calculating region to gene correlation, using SR method

2024-04-02 20:11:57,615 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-02 20:18:53,593 R2G          INFO     Took 419.5288677215576 seconds
2024-04-02 20:19:04,882 R2G          INFO     Done!
2024-04-02 20:19:05,111 SCENIC+_wrapper INFO     Inferring TF to gene relationships

2024-04-02 20:19:09,787 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-02 20:19:10,442 TF2G         INFO     Calculating TF to gene correlation, using GBM method
2024-04-02 20:41:26,231 TF2G         INFO     Took 1335.7876572608948 seconds
2024-04-02 20:41:26,239 TF2G         INFO     Adding correlation coefficients to adjacencies.
2024-04-02 20:42:25,640 TF2G         INFO     Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05
2024-04-02 20:42:42,332 TF2G         INFO     Adding importance x rho scores to adjacencies.
2024-04-02 20:42:42,408 TF2G         INFO     Took 76.16890239715576 seconds
2024-04-02 20:42:42,981 SCENIC+_wrapper INFO     Build eGRN
2024-04-02 20:42:42,983 GSEA         INFO     Thresholding region to gene relationships

2024-04-02 20:42:46,792 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-02 20:47:34,349 GSEA         INFO     Subsetting TF2G adjacencies for TF with motif.

2024-04-02 20:47:39,759 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-02 20:47:40,321 GSEA         INFO     Running GSEA...
2024-04-02 20:56:48,613 GSEA         INFO     Subsetting on adjusted pvalue: 1, minimal NES: 0 and minimal leading edge genes 10
2024-04-02 20:56:48,676 GSEA         INFO     Merging eRegulons
2024-04-02 20:56:48,678 GSEA         INFO     Storing eRegulons in .uns[eRegulons].
2024-04-02 20:56:50,079 SCENIC+_wrapper INFO     Formatting eGRNs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[11], line 24
     21 except Exception as e:
     22     #in case of failure, still save the object
     23     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 24     raise(e)

Cell In[11], line 4
      2 try:
      3     sys.stderr = open(os.devnull, "w")  # silence stderr
----> 4     run_scenicplus(
      5         scplus_obj = scplus_obj,
      6         variable = ['EPIgroup'],
      7         species = 'hsapiens',
      8         assembly = 'hg38',
      9         tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
     10         save_path = os.path.join(work_dir, 'scenicplus'),
     11         biomart_host = biomart_host,
     12         upstream = [1000, 150000],
     13         downstream = [1000, 150000],
     14         calculate_TF_eGRN_correlation = True,
     15         calculate_DEGs_DARs = True,
     16         export_to_loom_file = True,
     17         export_to_UCSC_file = True,
     18         n_cpu = 12,
     19         _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
     20     sys.stderr = sys.__stderr__  # unsilence stderr
     21 except Exception as e:
     22     #in case of failure, still save the object

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:195, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
    193 if 'eRegulon_metadata' not in scplus_obj.uns.keys():
    194     log.info('Formatting eGRNs')
--> 195     format_egrns(scplus_obj,
    196                   eregulons_key = 'eRegulons',
    197                   TF2G_key = 'TF2G_adj',
    198                   key_added = 'eRegulon_metadata')
    201 if 'eRegulon_signatures' not in scplus_obj.uns.keys():
    202     log.info('Converting eGRNs to signatures')

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/utils.py:764, in format_egrns(scplus_obj, eregulons_key, TF2G_key, key_added)
    761 tf2g_data = scplus_obj.uns[TF2G_key].copy()
    762 tf2g_data.columns = ['TF', 'Gene', 'TF2G_importance', 'TF2G_regulation',
    763                      'TF2G_rho', 'TF2G_importance_x_abs_rho', 'TF2G_importance_x_rho']
--> 764 egrn_metadata = pd.concat([pd.merge(r2g_data[x], tf2g_data[tf2g_data.TF == r2g_data[x].TF[0]], on=[
    765                           'TF', 'Gene']) for x in range(len(egrn_list))])
    766 scplus_obj.uns[key_added] = egrn_metadata

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/util/_decorators.py:317, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    311 if len(args) > num_allow_args:
    312     warnings.warn(
    313         msg.format(arguments=arguments),
    314         FutureWarning,
    315         stacklevel=find_stack_level(inspect.currentframe()),
    316     )
--> 317 return func(*args, **kwargs)

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/concat.py:369, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    147 @deprecate_nonkeyword_arguments(version=None, allowed_args=["objs"])
    148 def concat(
    149     objs: Iterable[NDFrame] | Mapping[HashableT, NDFrame],
   (...)
    158     copy: bool = True,
    159 ) -> DataFrame | Series:
    160     """
    161     Concatenate pandas objects along a particular axis.
    162 
   (...)
    367     1   3   4
    368     """
--> 369     op = _Concatenator(
    370         objs,
    371         axis=axis,
    372         ignore_index=ignore_index,
    373         join=join,
    374         keys=keys,
    375         levels=levels,
    376         names=names,
    377         verify_integrity=verify_integrity,
    378         copy=copy,
    379         sort=sort,
    380     )
    382     return op.get_result()

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/concat.py:426, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    423     objs = list(objs)
    425 if len(objs) == 0:
--> 426     raise ValueError("No objects to concatenate")
    428 if keys is None:
    429     objs = list(com.not_none(*objs))

ValueError: No objects to concatenate
SeppeDeWinter commented 2 months ago

Hi @wangdlpbr

You are using only 26 cells for your analysis. That's really not a lot.

The error you are getting indicates that no single eRegulon was found, this is probably due to the low number of cells.

All the best,

Seppe

wangdlpbr commented 2 months ago

Hi @SeppeDeWinter I set nr_cells_per_metacells = 15,rerun it.

from scenicplus.scenicplus_class import create_SCENICPLUS_object
import numpy as np
scplus_obj = create_SCENICPLUS_object(
        GEX_anndata = adata,
        cisTopic_obj = cistopic_obj,
        menr = menr,
        multi_ome_mode = False,
        key_to_group_by = 'EPIgroup',
        nr_cells_per_metacells = 15)
from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
    sys.stderr = open(os.devnull, "w")  # silence stderr
    run_scenicplus(
        scplus_obj = scplus_obj,
        variable = ['EPIgroup'],
        species = 'hsapiens',
        assembly = 'hg38',
        tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
        save_path = os.path.join(work_dir, 'scenicplus'),
        biomart_host = biomart_host,
        upstream = [1000, 150000],
        downstream = [1000, 150000],
        calculate_TF_eGRN_correlation = False,
        calculate_DEGs_DARs = False,
        export_to_loom_file = False,
        export_to_UCSC_file = False,
        n_cpu = 12,
        _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
    sys.stderr = sys.__stderr__  # unsilence stderr
except Exception as e:
    #in case of failure, still save the object
    dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
    raise(e)

2024-04-08 11:29:12,626 SCENIC+_wrapper INFO     /data4/litianqing/wangda/sc_rna_atac_zx/scenicplus/EPIgroup/scenicplus folder already exists.
2024-04-08 11:29:12,628 SCENIC+_wrapper INFO     Merging cistromes
2024-04-08 11:29:23,065 SCENIC+_wrapper INFO     Getting search space
2024-04-08 11:29:38,683 R2G          INFO     Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2024-04-08 11:30:43,736 R2G          INFO     Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2024-04-08 11:30:50,006 R2G          INFO     Extending promoter annotation to 10 bp upstream and 10 downstream
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-08 11:30:51,757 R2G          INFO     Extending search space to:
                                    150000 bp downstream of the end of the gene.
                                    150000 bp upstream of the start of the gene.
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-08 11:31:01,101 R2G          INFO     Intersecting with regions.
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-08 11:31:01,765 R2G          INFO     Calculating distances from region to gene
2024-04-08 11:31:37,938 R2G          INFO     Imploding multiple entries per region and gene
2024-04-08 11:32:28,255 R2G          INFO     Done!
2024-04-08 11:32:28,530 SCENIC+_wrapper INFO     Inferring region to gene relationships
2024-04-08 11:32:28,620 R2G          INFO     Calculating region to gene importances, using GBM method

2024-04-08 11:32:36,019 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-08 11:41:38,517 R2G          INFO     Took 549.8947191238403 seconds
2024-04-08 11:41:38,520 R2G          INFO     Calculating region to gene correlation, using SR method

2024-04-08 11:41:46,979 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-08 11:49:43,417 R2G          INFO     Took 484.89580821990967 seconds
2024-04-08 11:49:55,571 R2G          INFO     Done!
2024-04-08 11:49:55,850 SCENIC+_wrapper INFO     Inferring TF to gene relationships

2024-04-08 11:50:03,211 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-08 11:50:03,996 TF2G         INFO     Calculating TF to gene correlation, using GBM method
2024-04-08 12:33:37,502 TF2G         INFO     Took 2613.5039694309235 seconds
2024-04-08 12:33:37,548 TF2G         INFO     Adding correlation coefficients to adjacencies.
2024-04-08 12:35:43,750 TF2G         INFO     Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05
2024-04-08 12:36:06,075 TF2G         INFO     Adding importance x rho scores to adjacencies.
2024-04-08 12:36:06,130 TF2G         INFO     Took 148.58271837234497 seconds
2024-04-08 12:36:06,854 SCENIC+_wrapper INFO     Build eGRN
2024-04-08 12:36:06,856 GSEA         INFO     Thresholding region to gene relationships

2024-04-08 12:36:27,895 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-08 12:44:47,463 GSEA         INFO     Subsetting TF2G adjacencies for TF with motif.

2024-04-08 12:45:03,056 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

2024-04-08 12:45:04,384 GSEA         INFO     Running GSEA...
2024-04-08 13:00:14,826 GSEA         INFO     Subsetting on adjusted pvalue: 1, minimal NES: 0 and minimal leading edge genes 10
2024-04-08 13:00:14,924 GSEA         INFO     Merging eRegulons
2024-04-08 13:00:14,927 GSEA         INFO     Storing eRegulons in .uns[eRegulons].
2024-04-08 13:00:17,745 SCENIC+_wrapper INFO     Formatting eGRNs
2024-04-08 13:00:22,540 SCENIC+_wrapper INFO     Converting eGRNs to signatures
2024-04-08 13:00:22,548 SCENIC+_wrapper INFO     Calculating eGRNs AUC
2024-04-08 13:00:22,549 SCENIC+_wrapper INFO     Calculating region ranking
2024-04-08 13:00:22,998 SCENIC+_wrapper INFO     Calculating eGRNs region based AUC
2024-04-08 13:00:32,582 SCENIC+_wrapper INFO     Calculating gene ranking
2024-04-08 13:00:32,792 SCENIC+_wrapper INFO     Calculating eGRNs gene based AUC
2024-04-08 13:00:43,472 SCENIC+_wrapper INFO     Binarizing eGRNs AUC
2024-04-08 13:00:58,752 SCENIC+_wrapper INFO     Making eGRNs AUC UMAP
2024-04-08 13:01:06,418 SCENIC+_wrapper INFO     Making eGRNs AUC tSNE

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:221, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
    216     import fitsne
    217     embedding = fitsne.FItSNE(
    218         np.ascontiguousarray(
    219             data_mat.to_numpy()),
    220         rand_seed=random_state,
--> 221         perplexity=perplexity, **kwargs)
    222 except BaseException:

NameError: name 'perplexity' is not defined

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[35], line 24
     21 except Exception as e:
     22     #in case of failure, still save the object
     23     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 24     raise(e)

Cell In[35], line 4
      2 try:
      3     sys.stderr = open(os.devnull, "w")  # silence stderr
----> 4     run_scenicplus(
      5         scplus_obj = scplus_obj,
      6         variable = ['EPIgroup'],
      7         species = 'hsapiens',
      8         assembly = 'hg38',
      9         tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
     10         save_path = os.path.join(work_dir, 'scenicplus'),
     11         biomart_host = biomart_host,
     12         upstream = [1000, 150000],
     13         downstream = [1000, 150000],
     14         calculate_TF_eGRN_correlation = False,
     15         calculate_DEGs_DARs = False,
     16         export_to_loom_file = False,
     17         export_to_UCSC_file = False,
     18         n_cpu = 12,
     19         _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
     20     sys.stderr = sys.__stderr__  # unsilence stderr
     21 except Exception as e:
     22     #in case of failure, still save the object

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:294, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
    292 if 'eRegulons_tSNE' not in scplus_obj.dr_cell.keys():
    293     log.info('Making eGRNs AUC tSNE')
--> 294     run_eRegulons_tsne(scplus_obj,
    295                scale=True, signature_keys=['Gene_based', 'Region_based'])
    297 if 'RSS' not in scplus_obj.uns.keys():
    298     log.info('Calculating eRSS')

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:223, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
    217     embedding = fitsne.FItSNE(
    218         np.ascontiguousarray(
    219             data_mat.to_numpy()),
    220         rand_seed=random_state,
    221         perplexity=perplexity, **kwargs)
    222 except BaseException:
--> 223     embedding = sklearn.manifold.TSNE(
    224         n_components=2, random_state=random_state).fit_transform(
    225         data_mat.to_numpy(), **kwargs)
    226 dr = pd.DataFrame(
    227     embedding,
    228     index=data_names,
    229     columns=[
    230         'tSNE_1',
    231         'tSNE_2'])
    232 if not hasattr(scplus_obj, 'dr_cell'):

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/utils/_set_output.py:157, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    155 @wraps(f)
    156 def wrapped(self, X, *args, **kwargs):
--> 157     data_to_wrap = f(self, X, *args, **kwargs)
    158     if isinstance(data_to_wrap, tuple):
    159         # only wrap the first output for cross decomposition
    160         return_tuple = (
    161             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    162             *data_to_wrap[1:],
    163         )

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/base.py:1152, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1145     estimator._validate_params()
   1147 with config_context(
   1148     skip_parameter_validation=(
   1149         prefer_skip_nested_validation or global_skip_validation
   1150     )
   1151 ):
-> 1152     return fit_method(estimator, *args, **kwargs)

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:1110, in TSNE.fit_transform(self, X, y)
   1085 @_fit_context(
   1086     # TSNE.metric is not validated yet
   1087     prefer_skip_nested_validation=False
   1088 )
   1089 def fit_transform(self, X, y=None):
   1090     """Fit X into an embedded space and return that transformed output.
   1091 
   1092     Parameters
   (...)
   1108         Embedding of the training data in low-dimensional space.
   1109     """
-> 1110     self._check_params_vs_input(X)
   1111     embedding = self._fit(X)
   1112     self.embedding_ = embedding

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:821, in TSNE._check_params_vs_input(self, X)
    819 def _check_params_vs_input(self, X):
    820     if self.perplexity >= X.shape[0]:
--> 821         raise ValueError("perplexity must be less than n_samples")

ValueError: perplexity must be less than n_samples

2024-04-08 15:36:30,589 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
2024-04-08 15:41:48,555 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

It can format eGRNs,though rarely

len(scplus_obj.uns['eRegulons'])
6

I want to skip the filtering process and run heatmap_dotplot directly.I want to know how to run it correctly

from scenicplus.plotting.dotplot import heatmap_dotplot
heatmap_dotplot(
        scplus_obj = scplus_obj,
        size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
        color_matrix = scplus_obj.to_df('EXP'), #specify  what to plot as colors, TF expression in this case
        scale_size_matrix = True,
        scale_color_matrix = True,
        group_variable = 'EPIgroup',
        subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
        index_order = ['D6-7', 'D8-10', 'D12-14'],
        figsize = (10, 5),
        orientation = 'horizontal')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[107], line 4
      1 from scenicplus.plotting.dotplot import heatmap_dotplot
      2 heatmap_dotplot(
      3         scplus_obj = scplus_obj,
----> 4         size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
      5         color_matrix = scplus_obj.to_df('EXP'), #specify  what to plot as colors, TF expression in this case
      6         scale_size_matrix = True,
      7         scale_color_matrix = True,
      8         group_variable = 'EPIgroup',
      9         subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
     10         index_order = ['D6-7', 'D8-10', 'D12-14'],
     11         figsize = (10, 5),
     12         orientation = 'horizontal')

TypeError: list indices must be integers or slices, not str

Thanks

SeppeDeWinter commented 2 months ago

Hi @wangdlpbr

Can you please show the entire error message from the heatmap_dotplot call?

All the best,

Seppe

wangdlpbr commented 2 months ago

Hi @SeppeDeWinter I think the format of scplus_obj.uns['eRegulons'] is incorrect, causing an error

scplus_obj.uns['eRegulons']

[eRegulon for TF FOS in context frozenset({'positive r2g', 'Cistromes_Unfiltered', 'Top 10 region-to-gene links per gene', 'Top 15 region-to-gene links per gene', 'positive tf2g'}).
    This eRegulon has 54 target regions and 51 target genes.,
 eRegulon for TF ETV4 in context frozenset({'negative r2g', 'Top 15 region-to-gene links per gene', 'positive tf2g', 'Cistromes_Unfiltered'}).
    This eRegulon has 11 target regions and 10 target genes.,
 eRegulon for TF KLF9 in context frozenset({'negative r2g', 'Top 10 region-to-gene links per gene', 'Cistromes_Unfiltered', 'negative tf2g'}).
    This eRegulon has 15 target regions and 14 target genes.,
 eRegulon for TF FOS in context frozenset({'Top 10 region-to-gene links per gene', 'positive r2g', 'Cistromes_Unfiltered', 'positive tf2g'}).
    This eRegulon has 47 target regions and 46 target genes.,
 eRegulon for TF ETV4 in context frozenset({'negative r2g', 'Top 15 region-to-gene links per gene', 'Cistromes_Unfiltered', 'positive tf2g'}).
    This eRegulon has 11 target regions and 10 target genes.,
 eRegulon for TF HLX in context frozenset({'negative r2g', 'BASC binarized', 'positive tf2g', 'Cistromes_Unfiltered'}).
    This eRegulon has 19 target regions and 22 target genes.]
from scenicplus.plotting.dotplot import heatmap_dotplot
heatmap_dotplot(
        scplus_obj = scplus_obj,
        size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
        color_matrix = scplus_obj.to_df('EXP'), #specify  what to plot as colors, TF expression in this case
        scale_size_matrix = True,
        scale_color_matrix = True,
        group_variable = 'EPIgroup',
        subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
        index_order = ['D6-7', 'D8-10', 'D12-14'],
        figsize = (10, 5),
        orientation = 'horizontal')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[109], line 4
      1 from scenicplus.plotting.dotplot import heatmap_dotplot
      2 heatmap_dotplot(
      3         scplus_obj = scplus_obj,
----> 4         size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
      5         color_matrix = scplus_obj.to_df('EXP'), #specify  what to plot as colors, TF expression in this case
      6         scale_size_matrix = True,
      7         scale_color_matrix = True,
      8         group_variable = 'EPIgroup',
      9         subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
     10         index_order = ['D6-7', 'D8-10', 'D12-14'],
     11         figsize = (10, 5),
     12         orientation = 'horizontal')

TypeError: list indices must be integers or slices, not str

Thanks

SeppeDeWinter commented 2 months ago

Hi

Ah yes, the size_matrix should be a matrix. This field in the object is in fact formatted correctly, you should use the AUC matrix instead (i.e., scplus_obj.uns['eRegulon_AUC']['Region_based'])

Best,

Seppe

wangdlpbr commented 2 months ago

Hi @SeppeDeWinter Thanks for your help, I successfully ran heatmap_dotplot, and then I wanted to draw the network, but I encountered an error.

from scenicplus.plotting.dotplot import heatmap_dotplot
heatmap_dotplot(
        scplus_obj = scplus_obj,
        size_matrix = scplus_obj.uns['eRegulon_AUC']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
        color_matrix = scplus_obj.to_df('EXP'), #specify  what to plot as colors, TF expression in this case
        scale_size_matrix = True,
        scale_color_matrix = True,
        group_variable = 'EPIgroup',
        subset_eRegulons = scplus_obj.uns['eRegulon_AUC']['Gene_based'],
        index_order = ['D6-7', 'D8-10', 'D12-14'],
        figsize = (10, 5),
        orientation = 'horizontal')

I skipped find_highly_variable_features,and get nx_tables

from scenicplus.networks import create_nx_tables, create_nx_graph, plot_networkx, export_to_cytoscape
nx_tables = create_nx_tables(
    scplus_obj = scplus_obj,
    eRegulon_metadata_key ='eRegulon_metadata',
    subset_eRegulons = ['ETV4', 'ETV4_extended', 'FOS','FOS_extended','HLX_extended','KLF9'],
    add_differential_gene_expression = True,
    add_differential_region_accessibility = True,
    differential_variable = ['EPIgroup'])

Then I tried to draw the network

G, pos, edge_tables, node_tables = create_nx_graph(nx_tables,
                   use_edge_tables = ['TF2R','R2G'],
                   color_edge_by = {'TF2R': {'variable' : 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384',  'KLF9': '#FFF179'}},
                                    'R2G': {'variable' : 'R2G_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1}},
                   transparency_edge_by =  {'R2G': {'variable' : 'R2G_importance', 'min_alpha': 0.1, 'v_min': 0}},
                   width_edge_by = {'R2G': {'variable' : 'R2G_importance', 'max_size' :  1.5, 'min_size' : 1}},
                   color_node_by = {'TF': {'variable': 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384', 'KLF9': '#FFF179'}},
                                    'Gene': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'bwr'},
                                    'Region': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'viridis'}},
                   transparency_node_by =  {'Region': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1},
                                    'Gene': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1}},
                   size_node_by = {'TF': {'variable': 'fixed_size', 'fixed_size': 30},
                                    'Gene': {'variable': 'fixed_size', 'fixed_size': 15},
                                    'Region': {'variable': 'fixed_size', 'fixed_size': 10}},
                   shape_node_by = {'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
                                    'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
                                    'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'}},
                   label_size_by = {'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 20.0},
                                    'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0},
                                    'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0}},
                   layout='kamada_kawai_layout',
                   scale_position_by=250)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[39], line 1
----> 1 G, pos, edge_tables, node_tables = create_nx_graph(nx_tables,
      2                    use_edge_tables = ['TF2R','R2G'],
      3                    color_edge_by = {'TF2R': {'variable' : 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384',  'KLF9': '#FFF179'}},
      4                                     'R2G': {'variable' : 'R2G_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1}},
      5                    transparency_edge_by =  {'R2G': {'variable' : 'R2G_importance', 'min_alpha': 0.1, 'v_min': 0}},
      6                    width_edge_by = {'R2G': {'variable' : 'R2G_importance', 'max_size' :  1.5, 'min_size' : 1}},
      7                    color_node_by = {'TF': {'variable': 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384', 'KLF9': '#FFF179'}},
      8                                     'Gene': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'bwr'},
      9                                     'Region': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'viridis'}},
     10                    transparency_node_by =  {'Region': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1},
     11                                     'Gene': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1}},
     12                    size_node_by = {'TF': {'variable': 'fixed_size', 'fixed_size': 30},
     13                                     'Gene': {'variable': 'fixed_size', 'fixed_size': 15},
     14                                     'Region': {'variable': 'fixed_size', 'fixed_size': 10}},
     15                    shape_node_by = {'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
     16                                     'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
     17                                     'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'}},
     18                    label_size_by = {'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 20.0},
     19                                     'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0},
     20                                     'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0}},
     21                    layout='kamada_kawai_layout',
     22                    scale_position_by=250)

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:434, in create_nx_graph(nx_tables, use_edge_tables, color_edge_by, transparency_edge_by, width_edge_by, color_node_by, transparency_node_by, size_node_by, shape_node_by, label_size_by, label_color_by, layout, lc_dist_genes, lc_dist_TF, scale_position_by)
    431 use_node_tables = sorted(list(set(use_node_tables)), reverse=True)
    433 # Create graph
--> 434 edge_tables = pd.concat([_format_nx_table_internal(
    435     nx_tables, 'Edge', x, color_edge_by, transparency_edge_by, width_edge_by, {}) for x in use_edge_tables])
    436 edge_tables.dropna(axis = 0, how = 'any', inplace = True)
    437 G = nx.from_pandas_edgelist(edge_tables, edge_attr=True)

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:434, in <listcomp>(.0)
    431 use_node_tables = sorted(list(set(use_node_tables)), reverse=True)
    433 # Create graph
--> 434 edge_tables = pd.concat([_format_nx_table_internal(
    435     nx_tables, 'Edge', x, color_edge_by, transparency_edge_by, width_edge_by, {}) for x in use_edge_tables])
    436 edge_tables.dropna(axis = 0, how = 'any', inplace = True)
    437 G = nx.from_pandas_edgelist(edge_tables, edge_attr=True)

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:165, in _format_nx_table_internal(nx_tables, table_type, table_id, color_by, transparency_by, size_by, shape_by, label_size_by, label_color_by)
    163     else:
    164         color_dict = color_by[table_id]['category_color']
--> 165     color = color_var.apply(
    166         lambda x: to_rgba(color_dict[x])).to_numpy()
    167 elif 'continuous_color' in color_by[table_id].keys():
    168     if color_by[table_id]['continuous_color'] is None:

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4664 def apply(
   4665     self,
   4666     func: AggFuncType,
   (...)
   4669     **kwargs,
   4670 ) -> DataFrame | Series:
   4671     """
   4672     Invoke function on values of Series.
   4673 
   (...)
   4772     dtype: float64
   4773     """
-> 4774     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/apply.py:1100, in SeriesApply.apply(self)
   1097     return self.apply_str()
   1099 # self.f is Callable
-> 1100 return self.apply_standard()

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/apply.py:1151, in SeriesApply.apply_standard(self)
   1149     else:
   1150         values = obj.astype(object)._values
-> 1151         mapped = lib.map_infer(
   1152             values,
   1153             f,
   1154             convert=self.convert_dtype,
   1155         )
   1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1158     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1159     #  See also GH#25959 regarding EA support
   1160     return obj._constructor_expanddim(list(mapped), index=obj.index)

File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/_libs/lib.pyx:2919, in pandas._libs.lib.map_infer()

File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:166, in _format_nx_table_internal.<locals>.<lambda>(x)
    163     else:
    164         color_dict = color_by[table_id]['category_color']
    165     color = color_var.apply(
--> 166         lambda x: to_rgba(color_dict[x])).to_numpy()
    167 elif 'continuous_color' in color_by[table_id].keys():
    168     if color_by[table_id]['continuous_color'] is None:

KeyError: 'HLX'
SeppeDeWinter commented 2 months ago

Hi

Can you show the output of


print(nx_tables)

Best,

S

wangdlpbr commented 2 months ago

Hi @SeppeDeWinter print(nx_tables)

{'Edge': {'TF2R':      TF                    Region     Gene_signature_name  \
0   FOS    chr6:26216495-26218608           FOS_+_+_(51g)   
1   FOS    chr6:26103532-26105144           FOS_+_+_(51g)   
2   FOS    chr6:26183536-26184616           FOS_+_+_(51g)   
3   FOS    chr6:26157000-26158989           FOS_+_+_(51g)   
4   FOS   chr12:56222674-56224579           FOS_+_+_(51g)   
..  ...                       ...                     ...   
15  HLX  chr9:128922548-128922747  HLX_extended_+_-_(22g)   
16  HLX    chr9:34518352-34518624  HLX_extended_+_-_(22g)   
19  HLX   chr19:35758083-35758282  HLX_extended_+_-_(22g)   
20  HLX  chr1:206612495-206612694  HLX_extended_+_-_(22g)   
21  HLX   chr12:53553168-53553367  HLX_extended_+_-_(22g)   

     Region_signature_name is_extended  
0            FOS_+_+_(54r)       False  
1            FOS_+_+_(54r)       False  
2            FOS_+_+_(54r)       False  
3            FOS_+_+_(54r)       False  
4            FOS_+_+_(54r)       False  
..                     ...         ...  
15  HLX_extended_+_-_(19r)        True  
16  HLX_extended_+_-_(19r)        True  
19  HLX_extended_+_-_(19r)        True  
20  HLX_extended_+_-_(19r)        True  
21  HLX_extended_+_-_(19r)        True  

[157 rows x 5 columns], 'R2G':                       Region       Gene     Gene_signature_name  \
0     chr6:26216495-26218608  HIST1H2BG           FOS_+_+_(51g)   
1     chr6:26103532-26105144  HIST1H2BG           FOS_+_+_(51g)   
2     chr6:26183536-26184616  HIST1H2BG           FOS_+_+_(51g)   
3     chr6:26157000-26158989  HIST1H2BG           FOS_+_+_(51g)   
4    chr12:56222674-56224579      IL23A           FOS_+_+_(51g)   
..                       ...        ...                     ...   
17   chr12:53320964-53321403      PFDN5  HLX_extended_+_-_(22g)   
18  chr9:128922548-128922747   C9orf114  HLX_extended_+_-_(22g)   
19   chr19:35758083-35758282      APLP1  HLX_extended_+_-_(22g)   
20  chr1:206612495-206612694      EIF2D  HLX_extended_+_-_(22g)   
21   chr12:53553168-53553367     ATP5G2  HLX_extended_+_-_(22g)   

    R2G_importance  R2G_importance_x_abs_rho  R2G_importance_x_rho   R2G_rho  \
0         0.053994                  0.038567              0.038567  0.714286   
1         0.071888                  0.061619              0.061619  0.857143   
2         0.051016                  0.036440              0.036440  0.714286   
3         0.079733                  0.066445              0.066445  0.833333   
4         0.029942                  0.007172              0.007172  0.239525   
..             ...                       ...                   ...       ...   
17        0.039886                  0.034393             -0.034393 -0.862291   
18        0.467408                  0.242526             -0.242526 -0.518875   
19        0.154985                  0.093371             -0.093371 -0.602453   
20        0.171288                  0.118364             -0.118364 -0.691023   
21        0.056114                  0.043011             -0.043011 -0.766481   

     Region_signature_name is_extended  
0            FOS_+_+_(54r)       False  
1            FOS_+_+_(54r)       False  
2            FOS_+_+_(54r)       False  
3            FOS_+_+_(54r)       False  
4            FOS_+_+_(54r)       False  
..                     ...         ...  
17  HLX_extended_+_-_(19r)        True  
18  HLX_extended_+_-_(19r)        True  
19  HLX_extended_+_-_(19r)        True  
20  HLX_extended_+_-_(19r)        True  
21  HLX_extended_+_-_(19r)        True  

[179 rows x 9 columns], 'TF2G':      TF       Gene     Gene_signature_name   Region_signature_name  \
0   FOS  HIST1H2BG           FOS_+_+_(51g)           FOS_+_+_(54r)   
4   FOS      IL23A           FOS_+_+_(51g)           FOS_+_+_(54r)   
5   FOS      TLDC1           FOS_+_+_(51g)           FOS_+_+_(54r)   
6   FOS     DDX39A           FOS_+_+_(51g)           FOS_+_+_(54r)   
7   FOS       RELT           FOS_+_+_(51g)           FOS_+_+_(54r)   
..  ...        ...                     ...                     ...   
17  HLX      PFDN5  HLX_extended_+_-_(22g)  HLX_extended_+_-_(19r)   
18  HLX   C9orf114  HLX_extended_+_-_(22g)  HLX_extended_+_-_(19r)   
19  HLX      APLP1  HLX_extended_+_-_(22g)  HLX_extended_+_-_(19r)   
20  HLX      EIF2D  HLX_extended_+_-_(22g)  HLX_extended_+_-_(19r)   
21  HLX     ATP5G2  HLX_extended_+_-_(22g)  HLX_extended_+_-_(19r)   

    TF2G_importance  TF2G_importance_x_abs_rho  TF2G_importance_x_rho  \
0          0.190168                   0.094830               0.094830   
4          0.231820                   0.220519               0.220519   
5          0.128713                   0.086370               0.086370   
6          0.117175                   0.102582               0.102582   
7          0.251031                   0.138683               0.138683   
..              ...                        ...                    ...   
17         1.843144                   1.524329               1.524329   
18         1.464763                   0.779063               0.779063   
19         2.150820                   1.730371               1.730371   
20         1.586134                   1.260928               1.260928   
21         2.171248                   1.763424               1.763424   

    TF2G_regulation  TF2G_rho is_extended  
0                 1  0.498667       False  
4                 1  0.951249       False  
5                 1  0.671022       False  
6                 1  0.875466       False  
7                 1  0.552452       False  
..              ...       ...         ...  
17                1  0.827026        True  
18                1  0.531870        True  
19                1  0.804517        True  
20                1  0.794970        True  
21                1  0.812171        True  

[153 rows x 10 columns]}, 'Node': {'TF':      Node_type    TF  EPIgroup_Log2FC_D6-7  EPIgroup_Log2FC_D8-10  \
FOS         TF   FOS             -0.380419               0.063163   
ETV4        TF  ETV4              0.201970              -0.068840   
KLF9        TF  KLF9              0.871484               0.196870   
HLX         TF   HLX            -23.282949              -0.893224   

      EPIgroup_Log2FC_D12-14  
FOS                 0.281029  
ETV4               -0.113940  
KLF9               -1.669792  
HLX                 2.486625  , 'Gene':          Node_type      Gene  EPIgroup_Log2FC_D6-7  EPIgroup_Log2FC_D8-10  \
NOL7          Gene      NOL7             -0.149382               0.023373   
EML4          Gene      EML4             -0.085609              -0.031799   
ACTL6A        Gene    ACTL6A              0.008228              -0.021898   
PGS1          Gene      PGS1             -0.144941               0.173828   
B2M           Gene       B2M             -0.326089               0.021158   
...            ...       ...                   ...                    ...   
SELENBP1      Gene  SELENBP1              0.159115              -1.198785   
FHOD1         Gene     FHOD1              0.193900              -0.441795   
PLK5          Gene      PLK5             -5.516206              -0.196060   
RNF44         Gene     RNF44              0.130341              -0.086522   
GALNT3        Gene    GALNT3             -1.241592               0.215353   

          EPIgroup_Log2FC_D12-14  
NOL7                    0.114164  
EML4                    0.125841  
ACTL6A                  0.020931  
PGS1                   -0.089724  
B2M                     0.282418  
...                          ...  
SELENBP1                1.218523  
FHOD1                   0.369816  
PLK5                    1.915554  
RNF44                  -0.016342  
GALNT3                  0.719924  

[94 rows x 5 columns], 'Region':                        Node_type                  Region  \
chr6:26272546-26273911    Region  chr6:26272546-26273911   
chr11:568031-568810       Region     chr11:568031-568810   
chr1:2130745-2131227      Region    chr1:2130745-2131227   
chr16:4386393-4386977     Region   chr16:4386393-4386977   
chr16:3013724-3014414     Region   chr16:3013724-3014414   
...                          ...                     ...   
chr6:26157000-26158989    Region  chr6:26157000-26158989   
chr16:284583-285481       Region     chr16:284583-285481   
chr1:26224481-26225264    Region  chr1:26224481-26225264   
chr5:36241389-36242569    Region  chr5:36241389-36242569   
chr19:4246771-4247560     Region   chr19:4246771-4247560   

                        EPIgroup_Log2FC_D6-7  EPIgroup_Log2FC_D8-10  \
chr6:26272546-26273911             -0.136283              -0.308142   
chr11:568031-568810                -0.116367              -0.031428   
chr1:2130745-2131227                1.401683              -0.320824   
chr16:4386393-4386977              -2.389537               2.973340   
chr16:3013724-3014414              -0.192917              -0.201774   
...                                      ...                    ...   
chr6:26157000-26158989             -0.163891              -0.145684   
chr16:284583-285481                -0.154932              -0.104034   
chr1:26224481-26225264              1.327607              -0.097275   
chr5:36241389-36242569             -0.043003               0.027540   
chr19:4246771-4247560              -0.151104              -0.276715   

                        EPIgroup_Log2FC_D12-14  
chr6:26272546-26273911                0.512445  
chr11:568031-568810                   0.153362  
chr1:2130745-2131227                 -1.666928  
chr16:4386393-4386977                -2.528963  
chr16:3013724-3014414                 0.432142  
...                                        ...  
chr6:26157000-26158989                0.342736  
chr16:284583-285481                   0.279709  
chr1:26224481-26225264               -2.152462  
chr5:36241389-36242569                0.006050  
chr19:4246771-4247560                 0.488511  

[97 rows x 5 columns]}}
SeppeDeWinter commented 2 months ago

Hi @wangdlpbr

For the parameters: color_edge_by and color_node_by I think you have to put HLX instead of HLX_extended.

Does that work?

Best,

Seppe