aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

heatmap_dotplot() in the development branch #239

Closed pchiang5 closed 8 months ago

pchiang5 commented 8 months ago

Describe the bug I used the scenicplus in the development branch to generate the mdata_r with the structure below. However, it showed errors whatever I input ('size_val,' 'col_val'...) with the sort_data_by parameter. What shall be the correct str for plotting? Thank you.

In [5]: mdata_r Out[5]: MuData object with n_obs × n_vars = 5525 × 374671 backed at '/mnt/c/Users/pc/Downloads/scenicplus/tests/test_cli/scplusmdata.h5mu' uns: 'direct_e_regulon_metadata', 'extended_e_regulon_metadata' 6 modalities scRNA_counts: 5525 x 33866 obs: 'batch', 'species', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', '_scvi_batch', '_scvi_labels', 'leiden', 'celltype' var: 'species', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts' obsm: 'X_scVI', 'X_umap', '_scvi_extra_continuous_covs' scATAC_counts: 5525 x 340585 obs: 'cisTopic_nr_acc', 'cisTopic_log_nr_acc', 'Dupl_nr_frag', 'barcode', 'Log_unique_nr_frag', 'cisTopic_nr_frag', 'Total_nr_frag_in_regions', 'cisTopic_log_nr_frag', 'FRIP', 'Total_nr_frag', 'Dupl_rate', 'Unique_nr_frag', 'TSS_enrichment', 'Unique_nr_frag_in_regions', 'Log_total_nr_frag', 'batch', 'species', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', '_scvi_batch', '_scvi_labels', 'leiden', 'celltype', 'sample_id' var: 'Chromosome', 'Start', 'End', 'Width', 'cisTopic_nr_frag', 'cisTopic_log_nr_frag', 'cisTopic_nr_acc', 'cisTopic_log_nr_acc' direct_gene_based_AUC: 5525 x 69 direct_region_based_AUC: 5525 x 69 extended_gene_based_AUC: 5525 x 41 extended_region_based_AUC: 5525 x 41

To Reproduce

heatmap_dotplot(
scplus_mudata = mdata_r,
size_modality = 'direct_region_based_AUC', #specify what to plot as dot sizes, target region enrichment in this case
color_modality = 'scRNA_counts', #specify  what to plot as colors, TF expression in this case
group_variable= 'scRNA_counts:celltype',
eRegulon_metadata_key = 'direct_e_regulon_metadata',
size_feature_key = 'Region_signature_name',
color_feature_key = 'Gene',
feature_name_key = 'eRegulon_name',
sort_data_by= "color_variable" )

Error output

In [4]: heatmap_dotplot( ...: scplus_mudata = mdata_r, ...: size_modality = 'direct_region_based_AUC', #specify what to plot as dot sizes, target region enrichment in this case ...: color_modality = 'scRNA_counts', #specify what to plot as colors, TF expression in this case ...: group_variable= 'scRNA_counts:celltype', ...: eRegulon_metadata_key = 'direct_e_regulon_metadata', ...: size_feature_key = 'Region_signature_name', ...: color_feature_key = 'Gene', ...: feature_name_key = 'eRegulon_name', ...: sort_data_by= "color_variable" ) ...: ^[[A/home/pc/miniconda3/envs/scenicplusdev/lib/python3.8/site-packages/scenicplus/plotting/dotplot.py:42: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. Empty DataFrame Columns: [scRNA_counts, scRNA_counts:celltype, direct_region_based_AUC, eRegulon_name] Index: []

KeyError Traceback (most recent call last) Cell In[4], line 1 ----> 1 heatmap_dotplot( 2 scplus_mudata = mdata_r, 3 size_modality = 'direct_region_based_AUC', #specify what to plot as dot sizes, target region enrichment in this case 4 color_modality = 'scRNA_counts', #specify what to plot as colors, TF expression in this case 5 group_variable= 'scRNA_counts:celltype', 6 eRegulon_metadata_key = 'direct_e_regulon_metadata', 7 size_feature_key = 'Region_signature_name', 8 color_feature_key = 'Gene', 9 feature_name_key = 'eRegulon_name', 10 sort_data_by= "color_variable" )

File /home/pc/miniconda3/envs/scenicplusdev/lib/python3.8/site-packages/scenicplus/plotting/dotplot.py:117, in heatmap_dotplot(scplus_mudata, size_modality, color_modality, group_variable, eRegulon_metadata_key, size_feature_key, color_feature_key, feature_name_key, sort_data_by, subset_feature_names, scale_size_matrix, scale_color_matrix, group_variable_order, save, figsize, split_repressor_activator, orientation) 115 else: 116 plotting_df[group_variable] = pd.Categorical(plotting_df[group_variable], categories=group_variable_order) --> 117 tmp = plotting_df[[group_variable, feature_name_key, sort_data_by]] \ 118 .pivot_table(index=group_variable, columns=feature_name_key) \ 119 .fillna(0)[sort_data_by] 120 if group_variable_order is not None: 121 tmp = tmp.loc[group_variable_order]

File /home/pc/miniconda3/envs/scenicplusdev/lib/python3.8/site-packages/pandas/core/frame.py:3811, in DataFrame.getitem(self, key) 3809 if is_iterator(key): 3810 key = list(key) -> 3811 indexer = self.columns._get_indexer_strict(key, "columns")[1] 3813 # take() does not accept boolean indexers 3814 if getattr(indexer, "dtype", None) == bool:

File /home/pc/miniconda3/envs/scenicplusdev/lib/python3.8/site-packages/pandas/core/indexes/base.py:6108, in Index._get_indexer_strict(self, key, axis_name) 6105 else: 6106 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr) -> 6108 self._raise_if_missing(keyarr, indexer, axis_name) 6110 keyarr = self.take(indexer) 6111 if isinstance(key, Index): 6112 # GH 42790 - Preserve name from an Index

File /home/pc/miniconda3/envs/scenicplusdev/lib/python3.8/site-packages/pandas/core/indexes/base.py:6171, in Index._raise_if_missing(self, key, indexer, axis_name) 6168 raise KeyError(f"None of [{key}] are in the [{axis_name}]") 6170 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique()) -> 6171 raise KeyError(f"{not_found} not in index")

KeyError: "['color_variable'] not in index"

Expected behavior the heatmap jumps out like with the tutorial.

Screenshots If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):

Additional context Add any other context about the problem here.

pchiang5 commented 8 months ago

It was caused by the incompatibility of the sparse format in the RNA matrix.