Open wangdlpbr opened 3 months ago
Hi @wangdlpbr
It looks like no annotations are matching between your scATAC-seq and scRNA-seq data for EPIgroup
.
Could you show:
print(set(adata.obs["EPIgroup"]))
print(set(cistopic_obj.cell_data["EPIgroup"]))
All the best,
Seppe
Hi @wangdlpbr
It looks like no annotations are matching between your scATAC-seq and scRNA-seq data for
EPIgroup
. Could you show:print(set(adata.obs["EPIgroup"])) print(set(cistopic_obj.cell_data["EPIgroup"]))
All the best,
Seppe
Thanks for your help, I have solved this problem, but I have encountered a new one
from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
sys.stderr = open(os.devnull, "w") # silence stderr
run_scenicplus(
scplus_obj = scplus_obj,
variable = ['EPIgroup'],
species = 'hsapiens',
assembly = 'hg38',
tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
save_path = os.path.join(work_dir, 'scenicplus'),
biomart_host = biomart_host,
upstream = [1000, 150000],
downstream = [1000, 150000],
calculate_TF_eGRN_correlation = True,
calculate_DEGs_DARs = True,
export_to_loom_file = True,
export_to_UCSC_file = True,
n_cpu = 12,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'))
sys.stderr = sys.__stderr__ # unsilence stderr
except Exception as e:
#in case of failure, still save the object
dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e)
2024-03-16 18:20:26,470 SCENIC+_wrapper INFO /data4/litianqing/wangda/sc_rna_atac_zx/scenicplus/EPIgroup/scenicplus folder already exists.
2024-03-16 18:20:26,472 SCENIC+_wrapper INFO Getting search space
2024-03-16 18:20:38,399 R2G INFO Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2024-03-16 18:24:05,517 R2G INFO Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2024-03-16 18:24:17,354 R2G INFO Extending promoter annotation to 10 bp upstream and 10 downstream
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-03-16 18:24:19,335 R2G INFO Extending search space to:
150000 bp downstream of the end of the gene.
150000 bp upstream of the start of the gene.
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-03-16 18:24:30,233 R2G INFO Intersecting with regions.
Warning! Start and End columns now have different dtypes: int32 and int64
2024-03-16 18:24:30,688 R2G INFO Calculating distances from region to gene
2024-03-16 18:24:52,409 R2G INFO Imploding multiple entries per region and gene
2024-03-16 18:25:38,300 R2G INFO Done!
2024-03-16 18:25:38,443 SCENIC+_wrapper INFO Inferring region to gene relationships
2024-03-16 18:25:38,489 R2G INFO Calculating region to gene importances, using GBM method
2024-03-16 18:31:22,346 R2G INFO Took 343.85478925704956 seconds
2024-03-16 18:31:22,348 R2G INFO Calculating region to gene correlation, using SR method
2024-03-16 18:36:52,587 R2G INFO Took 330.238156080246 seconds
2024-03-16 18:37:02,521 R2G INFO Done!
2024-03-16 18:37:02,671 SCENIC+_wrapper INFO Inferring TF to gene relationships
2024-03-16 18:37:07,723 TF2G INFO Calculating TF to gene correlation, using GBM method
2024-03-16 19:01:57,943 TF2G INFO Took 1490.2188758850098 seconds
2024-03-16 19:01:57,952 TF2G INFO Adding correlation coefficients to adjacencies.
2024-03-16 19:02:52,181 TF2G INFO Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05
2024-03-16 19:03:11,854 TF2G INFO Adding importance x rho scores to adjacencies.
2024-03-16 19:03:11,908 TF2G INFO Took 73.95558142662048 seconds
2024-03-16 19:03:12,575 SCENIC+_wrapper INFO Build eGRN
2024-03-16 19:03:12,577 GSEA INFO Thresholding region to gene relationships
2024-03-16 19:07:31,683 GSEA INFO Subsetting TF2G adjacencies for TF with motif.
2024-03-16 19:07:39,397 GSEA INFO Running GSEA...
2024-03-16 19:16:24,294 GSEA INFO Subsetting on adjusted pvalue: 1, minimal NES: 0 and minimal leading edge genes 10
2024-03-16 19:16:24,342 GSEA INFO Merging eRegulons
2024-03-16 19:16:24,344 GSEA INFO Storing eRegulons in .uns[eRegulons].
2024-03-16 19:16:26,445 SCENIC+_wrapper INFO Formatting eGRNs
2024-03-16 19:16:27,712 SCENIC+_wrapper INFO Converting eGRNs to signatures
2024-03-16 19:16:27,716 SCENIC+_wrapper INFO Calculating eGRNs AUC
2024-03-16 19:16:27,717 SCENIC+_wrapper INFO Calculating region ranking
2024-03-16 19:16:28,347 SCENIC+_wrapper INFO Calculating eGRNs region based AUC
2024-03-16 19:16:31,637 SCENIC+_wrapper INFO Calculating gene ranking
2024-03-16 19:16:31,885 SCENIC+_wrapper INFO Calculating eGRNs gene based AUC
2024-03-16 19:16:35,442 SCENIC+_wrapper INFO Calculating TF-eGRNs AUC correlation
2024-03-16 19:16:39,680 SCENIC+_wrapper INFO Binarizing eGRNs AUC
2024-03-16 19:16:44,424 SCENIC+_wrapper INFO Making eGRNs AUC UMAP
2024-03-16 19:16:52,163 SCENIC+_wrapper INFO Making eGRNs AUC tSNE
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:221, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
216 import fitsne
217 embedding = fitsne.FItSNE(
218 np.ascontiguousarray(
219 data_mat.to_numpy()),
220 rand_seed=random_state,
--> 221 perplexity=perplexity, **kwargs)
222 except BaseException:
NameError: name 'perplexity' is not defined
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[37], line 24
21 except Exception as e:
22 #in case of failure, still save the object
23 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 24 raise(e)
Cell In[37], line 4
2 try:
3 sys.stderr = open(os.devnull, "w") # silence stderr
----> 4 run_scenicplus(
5 scplus_obj = scplus_obj,
6 variable = ['EPIgroup'],
7 species = 'hsapiens',
8 assembly = 'hg38',
9 tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
10 save_path = os.path.join(work_dir, 'scenicplus'),
11 biomart_host = biomart_host,
12 upstream = [1000, 150000],
13 downstream = [1000, 150000],
14 calculate_TF_eGRN_correlation = True,
15 calculate_DEGs_DARs = True,
16 export_to_loom_file = True,
17 export_to_UCSC_file = True,
18 n_cpu = 12,
19 _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
20 sys.stderr = sys.__stderr__ # unsilence stderr
21 except Exception as e:
22 #in case of failure, still save the object
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:294, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
292 if 'eRegulons_tSNE' not in scplus_obj.dr_cell.keys():
293 log.info('Making eGRNs AUC tSNE')
--> 294 run_eRegulons_tsne(scplus_obj,
295 scale=True, signature_keys=['Gene_based', 'Region_based'])
297 if 'RSS' not in scplus_obj.uns.keys():
298 log.info('Calculating eRSS')
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:223, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
217 embedding = fitsne.FItSNE(
218 np.ascontiguousarray(
219 data_mat.to_numpy()),
220 rand_seed=random_state,
221 perplexity=perplexity, **kwargs)
222 except BaseException:
--> 223 embedding = sklearn.manifold.TSNE(
224 n_components=2, random_state=random_state).fit_transform(
225 data_mat.to_numpy(), **kwargs)
226 dr = pd.DataFrame(
227 embedding,
228 index=data_names,
229 columns=[
230 'tSNE_1',
231 'tSNE_2'])
232 if not hasattr(scplus_obj, 'dr_cell'):
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/utils/_set_output.py:157, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
155 @wraps(f)
156 def wrapped(self, X, *args, **kwargs):
--> 157 data_to_wrap = f(self, X, *args, **kwargs)
158 if isinstance(data_to_wrap, tuple):
159 # only wrap the first output for cross decomposition
160 return_tuple = (
161 _wrap_data_with_container(method, data_to_wrap[0], X, self),
162 *data_to_wrap[1:],
163 )
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/base.py:1152, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
1145 estimator._validate_params()
1147 with config_context(
1148 skip_parameter_validation=(
1149 prefer_skip_nested_validation or global_skip_validation
1150 )
1151 ):
-> 1152 return fit_method(estimator, *args, **kwargs)
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:1110, in TSNE.fit_transform(self, X, y)
1085 @_fit_context(
1086 # TSNE.metric is not validated yet
1087 prefer_skip_nested_validation=False
1088 )
1089 def fit_transform(self, X, y=None):
1090 """Fit X into an embedded space and return that transformed output.
1091
1092 Parameters
(...)
1108 Embedding of the training data in low-dimensional space.
1109 """
-> 1110 self._check_params_vs_input(X)
1111 embedding = self._fit(X)
1112 self.embedding_ = embedding
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:821, in TSNE._check_params_vs_input(self, X)
819 def _check_params_vs_input(self, X):
820 if self.perplexity >= X.shape[0]:
--> 821 raise ValueError("perplexity must be less than n_samples")
ValueError: perplexity must be less than n_samples
Hi @wangdlpbr
How may cells do you have in your scplus_obj
and how many regulons (scplus_obj.uns["eRegulon_AUC"]["gene_based"]
).
Also, if you reach this part of the code the most essential steps of the workflow have been finished, so do make sure to save your scplus_obj at this point (no need to rerun everything).
All the best,
Seppe
Hi @SeppeDeWinter
I didn't save the previous scplus_obj, so I regenerated it, but I found that it produced errors that weren't there before.
In addition, aftercreate_SCENICPLUS_object
: Keeping 185 cells for RNA and 63 for ATAC, after run_scenicplus
: SCENIC+ object with n_cells x n_genes = 26 x 37263 and n_cells x n_regions = 26 x 84551
from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
sys.stderr = open(os.devnull, "w") # silence stderr
run_scenicplus(
scplus_obj = scplus_obj,
variable = ['EPIgroup'],
species = 'hsapiens',
assembly = 'hg38',
tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
save_path = os.path.join(work_dir, 'scenicplus'),
biomart_host = biomart_host,
upstream = [1000, 150000],
downstream = [1000, 150000],
calculate_TF_eGRN_correlation = True,
calculate_DEGs_DARs = True,
export_to_loom_file = True,
export_to_UCSC_file = True,
n_cpu = 12,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'))
sys.stderr = sys.__stderr__ # unsilence stderr
except Exception as e:
#in case of failure, still save the object
dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e)
2024-04-02 20:01:03,271 SCENIC+_wrapper INFO /data4/litianqing/wangda/sc_rna_atac_zx/scenicplus/EPIgroup/scenicplus folder already exists.
2024-04-02 20:01:03,273 SCENIC+_wrapper INFO Merging cistromes
2024-04-02 20:01:11,086 SCENIC+_wrapper INFO Getting search space
2024-04-02 20:01:22,764 R2G INFO Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2024-04-02 20:02:58,815 R2G INFO Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2024-04-02 20:03:05,520 R2G INFO Extending promoter annotation to 10 bp upstream and 10 downstream
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-02 20:03:06,960 R2G INFO Extending search space to:
150000 bp downstream of the end of the gene.
150000 bp upstream of the start of the gene.
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-02 20:03:14,718 R2G INFO Intersecting with regions.
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-02 20:03:16,272 R2G INFO Calculating distances from region to gene
2024-04-02 20:03:48,253 R2G INFO Imploding multiple entries per region and gene
2024-04-02 20:04:36,303 R2G INFO Done!
2024-04-02 20:04:36,576 SCENIC+_wrapper INFO Inferring region to gene relationships
2024-04-02 20:04:36,662 R2G INFO Calculating region to gene importances, using GBM method
2024-04-02 20:04:40,708 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-02 20:11:54,058 R2G INFO Took 437.39494943618774 seconds
2024-04-02 20:11:54,062 R2G INFO Calculating region to gene correlation, using SR method
2024-04-02 20:11:57,615 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-02 20:18:53,593 R2G INFO Took 419.5288677215576 seconds
2024-04-02 20:19:04,882 R2G INFO Done!
2024-04-02 20:19:05,111 SCENIC+_wrapper INFO Inferring TF to gene relationships
2024-04-02 20:19:09,787 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-02 20:19:10,442 TF2G INFO Calculating TF to gene correlation, using GBM method
2024-04-02 20:41:26,231 TF2G INFO Took 1335.7876572608948 seconds
2024-04-02 20:41:26,239 TF2G INFO Adding correlation coefficients to adjacencies.
2024-04-02 20:42:25,640 TF2G INFO Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05
2024-04-02 20:42:42,332 TF2G INFO Adding importance x rho scores to adjacencies.
2024-04-02 20:42:42,408 TF2G INFO Took 76.16890239715576 seconds
2024-04-02 20:42:42,981 SCENIC+_wrapper INFO Build eGRN
2024-04-02 20:42:42,983 GSEA INFO Thresholding region to gene relationships
2024-04-02 20:42:46,792 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-02 20:47:34,349 GSEA INFO Subsetting TF2G adjacencies for TF with motif.
2024-04-02 20:47:39,759 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-02 20:47:40,321 GSEA INFO Running GSEA...
2024-04-02 20:56:48,613 GSEA INFO Subsetting on adjusted pvalue: 1, minimal NES: 0 and minimal leading edge genes 10
2024-04-02 20:56:48,676 GSEA INFO Merging eRegulons
2024-04-02 20:56:48,678 GSEA INFO Storing eRegulons in .uns[eRegulons].
2024-04-02 20:56:50,079 SCENIC+_wrapper INFO Formatting eGRNs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[11], line 24
21 except Exception as e:
22 #in case of failure, still save the object
23 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 24 raise(e)
Cell In[11], line 4
2 try:
3 sys.stderr = open(os.devnull, "w") # silence stderr
----> 4 run_scenicplus(
5 scplus_obj = scplus_obj,
6 variable = ['EPIgroup'],
7 species = 'hsapiens',
8 assembly = 'hg38',
9 tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
10 save_path = os.path.join(work_dir, 'scenicplus'),
11 biomart_host = biomart_host,
12 upstream = [1000, 150000],
13 downstream = [1000, 150000],
14 calculate_TF_eGRN_correlation = True,
15 calculate_DEGs_DARs = True,
16 export_to_loom_file = True,
17 export_to_UCSC_file = True,
18 n_cpu = 12,
19 _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
20 sys.stderr = sys.__stderr__ # unsilence stderr
21 except Exception as e:
22 #in case of failure, still save the object
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:195, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
193 if 'eRegulon_metadata' not in scplus_obj.uns.keys():
194 log.info('Formatting eGRNs')
--> 195 format_egrns(scplus_obj,
196 eregulons_key = 'eRegulons',
197 TF2G_key = 'TF2G_adj',
198 key_added = 'eRegulon_metadata')
201 if 'eRegulon_signatures' not in scplus_obj.uns.keys():
202 log.info('Converting eGRNs to signatures')
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/utils.py:764, in format_egrns(scplus_obj, eregulons_key, TF2G_key, key_added)
761 tf2g_data = scplus_obj.uns[TF2G_key].copy()
762 tf2g_data.columns = ['TF', 'Gene', 'TF2G_importance', 'TF2G_regulation',
763 'TF2G_rho', 'TF2G_importance_x_abs_rho', 'TF2G_importance_x_rho']
--> 764 egrn_metadata = pd.concat([pd.merge(r2g_data[x], tf2g_data[tf2g_data.TF == r2g_data[x].TF[0]], on=[
765 'TF', 'Gene']) for x in range(len(egrn_list))])
766 scplus_obj.uns[key_added] = egrn_metadata
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/util/_decorators.py:317, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
311 if len(args) > num_allow_args:
312 warnings.warn(
313 msg.format(arguments=arguments),
314 FutureWarning,
315 stacklevel=find_stack_level(inspect.currentframe()),
316 )
--> 317 return func(*args, **kwargs)
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/concat.py:369, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
147 @deprecate_nonkeyword_arguments(version=None, allowed_args=["objs"])
148 def concat(
149 objs: Iterable[NDFrame] | Mapping[HashableT, NDFrame],
(...)
158 copy: bool = True,
159 ) -> DataFrame | Series:
160 """
161 Concatenate pandas objects along a particular axis.
162
(...)
367 1 3 4
368 """
--> 369 op = _Concatenator(
370 objs,
371 axis=axis,
372 ignore_index=ignore_index,
373 join=join,
374 keys=keys,
375 levels=levels,
376 names=names,
377 verify_integrity=verify_integrity,
378 copy=copy,
379 sort=sort,
380 )
382 return op.get_result()
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/reshape/concat.py:426, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
423 objs = list(objs)
425 if len(objs) == 0:
--> 426 raise ValueError("No objects to concatenate")
428 if keys is None:
429 objs = list(com.not_none(*objs))
ValueError: No objects to concatenate
Hi @wangdlpbr
You are using only 26 cells for your analysis. That's really not a lot.
The error you are getting indicates that no single eRegulon was found, this is probably due to the low number of cells.
All the best,
Seppe
Hi @SeppeDeWinter I set nr_cells_per_metacells = 15,rerun it.
from scenicplus.scenicplus_class import create_SCENICPLUS_object
import numpy as np
scplus_obj = create_SCENICPLUS_object(
GEX_anndata = adata,
cisTopic_obj = cistopic_obj,
menr = menr,
multi_ome_mode = False,
key_to_group_by = 'EPIgroup',
nr_cells_per_metacells = 15)
from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
sys.stderr = open(os.devnull, "w") # silence stderr
run_scenicplus(
scplus_obj = scplus_obj,
variable = ['EPIgroup'],
species = 'hsapiens',
assembly = 'hg38',
tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
save_path = os.path.join(work_dir, 'scenicplus'),
biomart_host = biomart_host,
upstream = [1000, 150000],
downstream = [1000, 150000],
calculate_TF_eGRN_correlation = False,
calculate_DEGs_DARs = False,
export_to_loom_file = False,
export_to_UCSC_file = False,
n_cpu = 12,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'))
sys.stderr = sys.__stderr__ # unsilence stderr
except Exception as e:
#in case of failure, still save the object
dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e)
2024-04-08 11:29:12,626 SCENIC+_wrapper INFO /data4/litianqing/wangda/sc_rna_atac_zx/scenicplus/EPIgroup/scenicplus folder already exists.
2024-04-08 11:29:12,628 SCENIC+_wrapper INFO Merging cistromes
2024-04-08 11:29:23,065 SCENIC+_wrapper INFO Getting search space
2024-04-08 11:29:38,683 R2G INFO Downloading gene annotation from biomart dataset: hsapiens_gene_ensembl
2024-04-08 11:30:43,736 R2G INFO Downloading chromosome sizes from: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
2024-04-08 11:30:50,006 R2G INFO Extending promoter annotation to 10 bp upstream and 10 downstream
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-08 11:30:51,757 R2G INFO Extending search space to:
150000 bp downstream of the end of the gene.
150000 bp upstream of the start of the gene.
Warning! Start and End columns now have different dtypes: int32 and int64
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-08 11:31:01,101 R2G INFO Intersecting with regions.
Warning! Start and End columns now have different dtypes: int32 and int64
2024-04-08 11:31:01,765 R2G INFO Calculating distances from region to gene
2024-04-08 11:31:37,938 R2G INFO Imploding multiple entries per region and gene
2024-04-08 11:32:28,255 R2G INFO Done!
2024-04-08 11:32:28,530 SCENIC+_wrapper INFO Inferring region to gene relationships
2024-04-08 11:32:28,620 R2G INFO Calculating region to gene importances, using GBM method
2024-04-08 11:32:36,019 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-08 11:41:38,517 R2G INFO Took 549.8947191238403 seconds
2024-04-08 11:41:38,520 R2G INFO Calculating region to gene correlation, using SR method
2024-04-08 11:41:46,979 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-08 11:49:43,417 R2G INFO Took 484.89580821990967 seconds
2024-04-08 11:49:55,571 R2G INFO Done!
2024-04-08 11:49:55,850 SCENIC+_wrapper INFO Inferring TF to gene relationships
2024-04-08 11:50:03,211 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-08 11:50:03,996 TF2G INFO Calculating TF to gene correlation, using GBM method
2024-04-08 12:33:37,502 TF2G INFO Took 2613.5039694309235 seconds
2024-04-08 12:33:37,548 TF2G INFO Adding correlation coefficients to adjacencies.
2024-04-08 12:35:43,750 TF2G INFO Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05
2024-04-08 12:36:06,075 TF2G INFO Adding importance x rho scores to adjacencies.
2024-04-08 12:36:06,130 TF2G INFO Took 148.58271837234497 seconds
2024-04-08 12:36:06,854 SCENIC+_wrapper INFO Build eGRN
2024-04-08 12:36:06,856 GSEA INFO Thresholding region to gene relationships
2024-04-08 12:36:27,895 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-08 12:44:47,463 GSEA INFO Subsetting TF2G adjacencies for TF with motif.
2024-04-08 12:45:03,056 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-08 12:45:04,384 GSEA INFO Running GSEA...
2024-04-08 13:00:14,826 GSEA INFO Subsetting on adjusted pvalue: 1, minimal NES: 0 and minimal leading edge genes 10
2024-04-08 13:00:14,924 GSEA INFO Merging eRegulons
2024-04-08 13:00:14,927 GSEA INFO Storing eRegulons in .uns[eRegulons].
2024-04-08 13:00:17,745 SCENIC+_wrapper INFO Formatting eGRNs
2024-04-08 13:00:22,540 SCENIC+_wrapper INFO Converting eGRNs to signatures
2024-04-08 13:00:22,548 SCENIC+_wrapper INFO Calculating eGRNs AUC
2024-04-08 13:00:22,549 SCENIC+_wrapper INFO Calculating region ranking
2024-04-08 13:00:22,998 SCENIC+_wrapper INFO Calculating eGRNs region based AUC
2024-04-08 13:00:32,582 SCENIC+_wrapper INFO Calculating gene ranking
2024-04-08 13:00:32,792 SCENIC+_wrapper INFO Calculating eGRNs gene based AUC
2024-04-08 13:00:43,472 SCENIC+_wrapper INFO Binarizing eGRNs AUC
2024-04-08 13:00:58,752 SCENIC+_wrapper INFO Making eGRNs AUC UMAP
2024-04-08 13:01:06,418 SCENIC+_wrapper INFO Making eGRNs AUC tSNE
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:221, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
216 import fitsne
217 embedding = fitsne.FItSNE(
218 np.ascontiguousarray(
219 data_mat.to_numpy()),
220 rand_seed=random_state,
--> 221 perplexity=perplexity, **kwargs)
222 except BaseException:
NameError: name 'perplexity' is not defined
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Cell In[35], line 24
21 except Exception as e:
22 #in case of failure, still save the object
23 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
---> 24 raise(e)
Cell In[35], line 4
2 try:
3 sys.stderr = open(os.devnull, "w") # silence stderr
----> 4 run_scenicplus(
5 scplus_obj = scplus_obj,
6 variable = ['EPIgroup'],
7 species = 'hsapiens',
8 assembly = 'hg38',
9 tf_file = '/data4/litianqing/wangda/ref/scenicplus_database/utoronto_human_tfs_v_1.01.txt',
10 save_path = os.path.join(work_dir, 'scenicplus'),
11 biomart_host = biomart_host,
12 upstream = [1000, 150000],
13 downstream = [1000, 150000],
14 calculate_TF_eGRN_correlation = False,
15 calculate_DEGs_DARs = False,
16 export_to_loom_file = False,
17 export_to_UCSC_file = False,
18 n_cpu = 12,
19 _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
20 sys.stderr = sys.__stderr__ # unsilence stderr
21 except Exception as e:
22 #in case of failure, still save the object
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:294, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
292 if 'eRegulons_tSNE' not in scplus_obj.dr_cell.keys():
293 log.info('Making eGRNs AUC tSNE')
--> 294 run_eRegulons_tsne(scplus_obj,
295 scale=True, signature_keys=['Gene_based', 'Region_based'])
297 if 'RSS' not in scplus_obj.uns.keys():
298 log.info('Calculating eRSS')
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/dimensionality_reduction.py:223, in run_eRegulons_tsne(scplus_obj, scale, auc_key, signature_keys, reduction_name, random_state, selected_regulons, selected_cells, **kwargs)
217 embedding = fitsne.FItSNE(
218 np.ascontiguousarray(
219 data_mat.to_numpy()),
220 rand_seed=random_state,
221 perplexity=perplexity, **kwargs)
222 except BaseException:
--> 223 embedding = sklearn.manifold.TSNE(
224 n_components=2, random_state=random_state).fit_transform(
225 data_mat.to_numpy(), **kwargs)
226 dr = pd.DataFrame(
227 embedding,
228 index=data_names,
229 columns=[
230 'tSNE_1',
231 'tSNE_2'])
232 if not hasattr(scplus_obj, 'dr_cell'):
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/utils/_set_output.py:157, in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
155 @wraps(f)
156 def wrapped(self, X, *args, **kwargs):
--> 157 data_to_wrap = f(self, X, *args, **kwargs)
158 if isinstance(data_to_wrap, tuple):
159 # only wrap the first output for cross decomposition
160 return_tuple = (
161 _wrap_data_with_container(method, data_to_wrap[0], X, self),
162 *data_to_wrap[1:],
163 )
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/base.py:1152, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
1145 estimator._validate_params()
1147 with config_context(
1148 skip_parameter_validation=(
1149 prefer_skip_nested_validation or global_skip_validation
1150 )
1151 ):
-> 1152 return fit_method(estimator, *args, **kwargs)
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:1110, in TSNE.fit_transform(self, X, y)
1085 @_fit_context(
1086 # TSNE.metric is not validated yet
1087 prefer_skip_nested_validation=False
1088 )
1089 def fit_transform(self, X, y=None):
1090 """Fit X into an embedded space and return that transformed output.
1091
1092 Parameters
(...)
1108 Embedding of the training data in low-dimensional space.
1109 """
-> 1110 self._check_params_vs_input(X)
1111 embedding = self._fit(X)
1112 self.embedding_ = embedding
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/sklearn/manifold/_t_sne.py:821, in TSNE._check_params_vs_input(self, X)
819 def _check_params_vs_input(self, X):
820 if self.perplexity >= X.shape[0]:
--> 821 raise ValueError("perplexity must be less than n_samples")
ValueError: perplexity must be less than n_samples
2024-04-08 15:36:30,589 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
2024-04-08 15:41:48,555 INFO worker.py:1664 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
It can format eGRNs,though rarely
len(scplus_obj.uns['eRegulons'])
6
I want to skip the filtering process and run heatmap_dotplot directly.I want to know how to run it correctly
from scenicplus.plotting.dotplot import heatmap_dotplot
heatmap_dotplot(
scplus_obj = scplus_obj,
size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
color_matrix = scplus_obj.to_df('EXP'), #specify what to plot as colors, TF expression in this case
scale_size_matrix = True,
scale_color_matrix = True,
group_variable = 'EPIgroup',
subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
index_order = ['D6-7', 'D8-10', 'D12-14'],
figsize = (10, 5),
orientation = 'horizontal')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[107], line 4
1 from scenicplus.plotting.dotplot import heatmap_dotplot
2 heatmap_dotplot(
3 scplus_obj = scplus_obj,
----> 4 size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
5 color_matrix = scplus_obj.to_df('EXP'), #specify what to plot as colors, TF expression in this case
6 scale_size_matrix = True,
7 scale_color_matrix = True,
8 group_variable = 'EPIgroup',
9 subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
10 index_order = ['D6-7', 'D8-10', 'D12-14'],
11 figsize = (10, 5),
12 orientation = 'horizontal')
TypeError: list indices must be integers or slices, not str
Thanks
Hi @wangdlpbr
Can you please show the entire error message from the heatmap_dotplot
call?
All the best,
Seppe
Hi @SeppeDeWinter I think the format of scplus_obj.uns['eRegulons'] is incorrect, causing an error
scplus_obj.uns['eRegulons']
[eRegulon for TF FOS in context frozenset({'positive r2g', 'Cistromes_Unfiltered', 'Top 10 region-to-gene links per gene', 'Top 15 region-to-gene links per gene', 'positive tf2g'}).
This eRegulon has 54 target regions and 51 target genes.,
eRegulon for TF ETV4 in context frozenset({'negative r2g', 'Top 15 region-to-gene links per gene', 'positive tf2g', 'Cistromes_Unfiltered'}).
This eRegulon has 11 target regions and 10 target genes.,
eRegulon for TF KLF9 in context frozenset({'negative r2g', 'Top 10 region-to-gene links per gene', 'Cistromes_Unfiltered', 'negative tf2g'}).
This eRegulon has 15 target regions and 14 target genes.,
eRegulon for TF FOS in context frozenset({'Top 10 region-to-gene links per gene', 'positive r2g', 'Cistromes_Unfiltered', 'positive tf2g'}).
This eRegulon has 47 target regions and 46 target genes.,
eRegulon for TF ETV4 in context frozenset({'negative r2g', 'Top 15 region-to-gene links per gene', 'Cistromes_Unfiltered', 'positive tf2g'}).
This eRegulon has 11 target regions and 10 target genes.,
eRegulon for TF HLX in context frozenset({'negative r2g', 'BASC binarized', 'positive tf2g', 'Cistromes_Unfiltered'}).
This eRegulon has 19 target regions and 22 target genes.]
from scenicplus.plotting.dotplot import heatmap_dotplot
heatmap_dotplot(
scplus_obj = scplus_obj,
size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
color_matrix = scplus_obj.to_df('EXP'), #specify what to plot as colors, TF expression in this case
scale_size_matrix = True,
scale_color_matrix = True,
group_variable = 'EPIgroup',
subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
index_order = ['D6-7', 'D8-10', 'D12-14'],
figsize = (10, 5),
orientation = 'horizontal')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[109], line 4
1 from scenicplus.plotting.dotplot import heatmap_dotplot
2 heatmap_dotplot(
3 scplus_obj = scplus_obj,
----> 4 size_matrix = scplus_obj.uns['eRegulons']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
5 color_matrix = scplus_obj.to_df('EXP'), #specify what to plot as colors, TF expression in this case
6 scale_size_matrix = True,
7 scale_color_matrix = True,
8 group_variable = 'EPIgroup',
9 subset_eRegulons = scplus_obj.uns['eRegulons']['Gene_based'],
10 index_order = ['D6-7', 'D8-10', 'D12-14'],
11 figsize = (10, 5),
12 orientation = 'horizontal')
TypeError: list indices must be integers or slices, not str
Thanks
Hi
Ah yes, the size_matrix
should be a matrix. This field in the object is in fact formatted correctly, you should use the AUC matrix instead (i.e., scplus_obj.uns['eRegulon_AUC']['Region_based']
)
Best,
Seppe
Hi @SeppeDeWinter Thanks for your help, I successfully ran heatmap_dotplot, and then I wanted to draw the network, but I encountered an error.
from scenicplus.plotting.dotplot import heatmap_dotplot
heatmap_dotplot(
scplus_obj = scplus_obj,
size_matrix = scplus_obj.uns['eRegulon_AUC']['Region_based'], #specify what to plot as dot sizes, target region enrichment in this case
color_matrix = scplus_obj.to_df('EXP'), #specify what to plot as colors, TF expression in this case
scale_size_matrix = True,
scale_color_matrix = True,
group_variable = 'EPIgroup',
subset_eRegulons = scplus_obj.uns['eRegulon_AUC']['Gene_based'],
index_order = ['D6-7', 'D8-10', 'D12-14'],
figsize = (10, 5),
orientation = 'horizontal')
I skipped find_highly_variable_features
,and get nx_tables
from scenicplus.networks import create_nx_tables, create_nx_graph, plot_networkx, export_to_cytoscape
nx_tables = create_nx_tables(
scplus_obj = scplus_obj,
eRegulon_metadata_key ='eRegulon_metadata',
subset_eRegulons = ['ETV4', 'ETV4_extended', 'FOS','FOS_extended','HLX_extended','KLF9'],
add_differential_gene_expression = True,
add_differential_region_accessibility = True,
differential_variable = ['EPIgroup'])
Then I tried to draw the network
G, pos, edge_tables, node_tables = create_nx_graph(nx_tables,
use_edge_tables = ['TF2R','R2G'],
color_edge_by = {'TF2R': {'variable' : 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384', 'KLF9': '#FFF179'}},
'R2G': {'variable' : 'R2G_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1}},
transparency_edge_by = {'R2G': {'variable' : 'R2G_importance', 'min_alpha': 0.1, 'v_min': 0}},
width_edge_by = {'R2G': {'variable' : 'R2G_importance', 'max_size' : 1.5, 'min_size' : 1}},
color_node_by = {'TF': {'variable': 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384', 'KLF9': '#FFF179'}},
'Gene': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'bwr'},
'Region': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'viridis'}},
transparency_node_by = {'Region': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1},
'Gene': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1}},
size_node_by = {'TF': {'variable': 'fixed_size', 'fixed_size': 30},
'Gene': {'variable': 'fixed_size', 'fixed_size': 15},
'Region': {'variable': 'fixed_size', 'fixed_size': 10}},
shape_node_by = {'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'}},
label_size_by = {'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 20.0},
'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0},
'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0}},
layout='kamada_kawai_layout',
scale_position_by=250)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[39], line 1
----> 1 G, pos, edge_tables, node_tables = create_nx_graph(nx_tables,
2 use_edge_tables = ['TF2R','R2G'],
3 color_edge_by = {'TF2R': {'variable' : 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384', 'KLF9': '#FFF179'}},
4 'R2G': {'variable' : 'R2G_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1}},
5 transparency_edge_by = {'R2G': {'variable' : 'R2G_importance', 'min_alpha': 0.1, 'v_min': 0}},
6 width_edge_by = {'R2G': {'variable' : 'R2G_importance', 'max_size' : 1.5, 'min_size' : 1}},
7 color_node_by = {'TF': {'variable': 'TF', 'category_color' : {'ETV4': 'Orange', 'ETV4_extended': 'Purple', 'FOS': 'Red','FOS_extended': '#88A0DC', 'HLX_extended': '#295384', 'KLF9': '#FFF179'}},
8 'Gene': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'bwr'},
9 'Region': {'variable': 'EPIgroup_Log2FC_D6-7', 'continuous_color' : 'viridis'}},
10 transparency_node_by = {'Region': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1},
11 'Gene': {'variable' : 'EPIgroup_Log2FC_D6-7', 'min_alpha': 0.1}},
12 size_node_by = {'TF': {'variable': 'fixed_size', 'fixed_size': 30},
13 'Gene': {'variable': 'fixed_size', 'fixed_size': 15},
14 'Region': {'variable': 'fixed_size', 'fixed_size': 10}},
15 shape_node_by = {'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
16 'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
17 'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'}},
18 label_size_by = {'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 20.0},
19 'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0},
20 'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0}},
21 layout='kamada_kawai_layout',
22 scale_position_by=250)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:434, in create_nx_graph(nx_tables, use_edge_tables, color_edge_by, transparency_edge_by, width_edge_by, color_node_by, transparency_node_by, size_node_by, shape_node_by, label_size_by, label_color_by, layout, lc_dist_genes, lc_dist_TF, scale_position_by)
431 use_node_tables = sorted(list(set(use_node_tables)), reverse=True)
433 # Create graph
--> 434 edge_tables = pd.concat([_format_nx_table_internal(
435 nx_tables, 'Edge', x, color_edge_by, transparency_edge_by, width_edge_by, {}) for x in use_edge_tables])
436 edge_tables.dropna(axis = 0, how = 'any', inplace = True)
437 G = nx.from_pandas_edgelist(edge_tables, edge_attr=True)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:434, in <listcomp>(.0)
431 use_node_tables = sorted(list(set(use_node_tables)), reverse=True)
433 # Create graph
--> 434 edge_tables = pd.concat([_format_nx_table_internal(
435 nx_tables, 'Edge', x, color_edge_by, transparency_edge_by, width_edge_by, {}) for x in use_edge_tables])
436 edge_tables.dropna(axis = 0, how = 'any', inplace = True)
437 G = nx.from_pandas_edgelist(edge_tables, edge_attr=True)
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:165, in _format_nx_table_internal(nx_tables, table_type, table_id, color_by, transparency_by, size_by, shape_by, label_size_by, label_color_by)
163 else:
164 color_dict = color_by[table_id]['category_color']
--> 165 color = color_var.apply(
166 lambda x: to_rgba(color_dict[x])).to_numpy()
167 elif 'continuous_color' in color_by[table_id].keys():
168 if color_by[table_id]['continuous_color'] is None:
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
4664 def apply(
4665 self,
4666 func: AggFuncType,
(...)
4669 **kwargs,
4670 ) -> DataFrame | Series:
4671 """
4672 Invoke function on values of Series.
4673
(...)
4772 dtype: float64
4773 """
-> 4774 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/apply.py:1100, in SeriesApply.apply(self)
1097 return self.apply_str()
1099 # self.f is Callable
-> 1100 return self.apply_standard()
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/apply.py:1151, in SeriesApply.apply_standard(self)
1149 else:
1150 values = obj.astype(object)._values
-> 1151 mapped = lib.map_infer(
1152 values,
1153 f,
1154 convert=self.convert_dtype,
1155 )
1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
1158 # GH#43986 Need to do list(mapped) in order to get treated as nested
1159 # See also GH#25959 regarding EA support
1160 return obj._constructor_expanddim(list(mapped), index=obj.index)
File /data4/litianqing/wangda/anaconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/_libs/lib.pyx:2919, in pandas._libs.lib.map_infer()
File /data4/litianqing/wangda/software/scenicplus/scenicplus/src/scenicplus/networks.py:166, in _format_nx_table_internal.<locals>.<lambda>(x)
163 else:
164 color_dict = color_by[table_id]['category_color']
165 color = color_var.apply(
--> 166 lambda x: to_rgba(color_dict[x])).to_numpy()
167 elif 'continuous_color' in color_by[table_id].keys():
168 if color_by[table_id]['continuous_color'] is None:
KeyError: 'HLX'
Hi
Can you show the output of
print(nx_tables)
Best,
S
Hi @SeppeDeWinter print(nx_tables)
{'Edge': {'TF2R': TF Region Gene_signature_name \
0 FOS chr6:26216495-26218608 FOS_+_+_(51g)
1 FOS chr6:26103532-26105144 FOS_+_+_(51g)
2 FOS chr6:26183536-26184616 FOS_+_+_(51g)
3 FOS chr6:26157000-26158989 FOS_+_+_(51g)
4 FOS chr12:56222674-56224579 FOS_+_+_(51g)
.. ... ... ...
15 HLX chr9:128922548-128922747 HLX_extended_+_-_(22g)
16 HLX chr9:34518352-34518624 HLX_extended_+_-_(22g)
19 HLX chr19:35758083-35758282 HLX_extended_+_-_(22g)
20 HLX chr1:206612495-206612694 HLX_extended_+_-_(22g)
21 HLX chr12:53553168-53553367 HLX_extended_+_-_(22g)
Region_signature_name is_extended
0 FOS_+_+_(54r) False
1 FOS_+_+_(54r) False
2 FOS_+_+_(54r) False
3 FOS_+_+_(54r) False
4 FOS_+_+_(54r) False
.. ... ...
15 HLX_extended_+_-_(19r) True
16 HLX_extended_+_-_(19r) True
19 HLX_extended_+_-_(19r) True
20 HLX_extended_+_-_(19r) True
21 HLX_extended_+_-_(19r) True
[157 rows x 5 columns], 'R2G': Region Gene Gene_signature_name \
0 chr6:26216495-26218608 HIST1H2BG FOS_+_+_(51g)
1 chr6:26103532-26105144 HIST1H2BG FOS_+_+_(51g)
2 chr6:26183536-26184616 HIST1H2BG FOS_+_+_(51g)
3 chr6:26157000-26158989 HIST1H2BG FOS_+_+_(51g)
4 chr12:56222674-56224579 IL23A FOS_+_+_(51g)
.. ... ... ...
17 chr12:53320964-53321403 PFDN5 HLX_extended_+_-_(22g)
18 chr9:128922548-128922747 C9orf114 HLX_extended_+_-_(22g)
19 chr19:35758083-35758282 APLP1 HLX_extended_+_-_(22g)
20 chr1:206612495-206612694 EIF2D HLX_extended_+_-_(22g)
21 chr12:53553168-53553367 ATP5G2 HLX_extended_+_-_(22g)
R2G_importance R2G_importance_x_abs_rho R2G_importance_x_rho R2G_rho \
0 0.053994 0.038567 0.038567 0.714286
1 0.071888 0.061619 0.061619 0.857143
2 0.051016 0.036440 0.036440 0.714286
3 0.079733 0.066445 0.066445 0.833333
4 0.029942 0.007172 0.007172 0.239525
.. ... ... ... ...
17 0.039886 0.034393 -0.034393 -0.862291
18 0.467408 0.242526 -0.242526 -0.518875
19 0.154985 0.093371 -0.093371 -0.602453
20 0.171288 0.118364 -0.118364 -0.691023
21 0.056114 0.043011 -0.043011 -0.766481
Region_signature_name is_extended
0 FOS_+_+_(54r) False
1 FOS_+_+_(54r) False
2 FOS_+_+_(54r) False
3 FOS_+_+_(54r) False
4 FOS_+_+_(54r) False
.. ... ...
17 HLX_extended_+_-_(19r) True
18 HLX_extended_+_-_(19r) True
19 HLX_extended_+_-_(19r) True
20 HLX_extended_+_-_(19r) True
21 HLX_extended_+_-_(19r) True
[179 rows x 9 columns], 'TF2G': TF Gene Gene_signature_name Region_signature_name \
0 FOS HIST1H2BG FOS_+_+_(51g) FOS_+_+_(54r)
4 FOS IL23A FOS_+_+_(51g) FOS_+_+_(54r)
5 FOS TLDC1 FOS_+_+_(51g) FOS_+_+_(54r)
6 FOS DDX39A FOS_+_+_(51g) FOS_+_+_(54r)
7 FOS RELT FOS_+_+_(51g) FOS_+_+_(54r)
.. ... ... ... ...
17 HLX PFDN5 HLX_extended_+_-_(22g) HLX_extended_+_-_(19r)
18 HLX C9orf114 HLX_extended_+_-_(22g) HLX_extended_+_-_(19r)
19 HLX APLP1 HLX_extended_+_-_(22g) HLX_extended_+_-_(19r)
20 HLX EIF2D HLX_extended_+_-_(22g) HLX_extended_+_-_(19r)
21 HLX ATP5G2 HLX_extended_+_-_(22g) HLX_extended_+_-_(19r)
TF2G_importance TF2G_importance_x_abs_rho TF2G_importance_x_rho \
0 0.190168 0.094830 0.094830
4 0.231820 0.220519 0.220519
5 0.128713 0.086370 0.086370
6 0.117175 0.102582 0.102582
7 0.251031 0.138683 0.138683
.. ... ... ...
17 1.843144 1.524329 1.524329
18 1.464763 0.779063 0.779063
19 2.150820 1.730371 1.730371
20 1.586134 1.260928 1.260928
21 2.171248 1.763424 1.763424
TF2G_regulation TF2G_rho is_extended
0 1 0.498667 False
4 1 0.951249 False
5 1 0.671022 False
6 1 0.875466 False
7 1 0.552452 False
.. ... ... ...
17 1 0.827026 True
18 1 0.531870 True
19 1 0.804517 True
20 1 0.794970 True
21 1 0.812171 True
[153 rows x 10 columns]}, 'Node': {'TF': Node_type TF EPIgroup_Log2FC_D6-7 EPIgroup_Log2FC_D8-10 \
FOS TF FOS -0.380419 0.063163
ETV4 TF ETV4 0.201970 -0.068840
KLF9 TF KLF9 0.871484 0.196870
HLX TF HLX -23.282949 -0.893224
EPIgroup_Log2FC_D12-14
FOS 0.281029
ETV4 -0.113940
KLF9 -1.669792
HLX 2.486625 , 'Gene': Node_type Gene EPIgroup_Log2FC_D6-7 EPIgroup_Log2FC_D8-10 \
NOL7 Gene NOL7 -0.149382 0.023373
EML4 Gene EML4 -0.085609 -0.031799
ACTL6A Gene ACTL6A 0.008228 -0.021898
PGS1 Gene PGS1 -0.144941 0.173828
B2M Gene B2M -0.326089 0.021158
... ... ... ... ...
SELENBP1 Gene SELENBP1 0.159115 -1.198785
FHOD1 Gene FHOD1 0.193900 -0.441795
PLK5 Gene PLK5 -5.516206 -0.196060
RNF44 Gene RNF44 0.130341 -0.086522
GALNT3 Gene GALNT3 -1.241592 0.215353
EPIgroup_Log2FC_D12-14
NOL7 0.114164
EML4 0.125841
ACTL6A 0.020931
PGS1 -0.089724
B2M 0.282418
... ...
SELENBP1 1.218523
FHOD1 0.369816
PLK5 1.915554
RNF44 -0.016342
GALNT3 0.719924
[94 rows x 5 columns], 'Region': Node_type Region \
chr6:26272546-26273911 Region chr6:26272546-26273911
chr11:568031-568810 Region chr11:568031-568810
chr1:2130745-2131227 Region chr1:2130745-2131227
chr16:4386393-4386977 Region chr16:4386393-4386977
chr16:3013724-3014414 Region chr16:3013724-3014414
... ... ...
chr6:26157000-26158989 Region chr6:26157000-26158989
chr16:284583-285481 Region chr16:284583-285481
chr1:26224481-26225264 Region chr1:26224481-26225264
chr5:36241389-36242569 Region chr5:36241389-36242569
chr19:4246771-4247560 Region chr19:4246771-4247560
EPIgroup_Log2FC_D6-7 EPIgroup_Log2FC_D8-10 \
chr6:26272546-26273911 -0.136283 -0.308142
chr11:568031-568810 -0.116367 -0.031428
chr1:2130745-2131227 1.401683 -0.320824
chr16:4386393-4386977 -2.389537 2.973340
chr16:3013724-3014414 -0.192917 -0.201774
... ... ...
chr6:26157000-26158989 -0.163891 -0.145684
chr16:284583-285481 -0.154932 -0.104034
chr1:26224481-26225264 1.327607 -0.097275
chr5:36241389-36242569 -0.043003 0.027540
chr19:4246771-4247560 -0.151104 -0.276715
EPIgroup_Log2FC_D12-14
chr6:26272546-26273911 0.512445
chr11:568031-568810 0.153362
chr1:2130745-2131227 -1.666928
chr16:4386393-4386977 -2.528963
chr16:3013724-3014414 0.432142
... ...
chr6:26157000-26158989 0.342736
chr16:284583-285481 0.279709
chr1:26224481-26225264 -2.152462
chr5:36241389-36242569 0.006050
chr19:4246771-4247560 0.488511
[97 rows x 5 columns]}}
Hi @wangdlpbr
For the parameters: color_edge_by
and color_node_by
I think you have to put HLX
instead of HLX_extended
.
Does that work?
Best,
Seppe
Hello, when I run create_SCENICPLUS_object, I get the following error, how should I solve this problem
code: