Closed decarlin closed 2 years ago
OK, I figured this out. scplus_obj.X_EXP was a <class 'scipy.sparse._csr.csr_matrix'> where calculate_TFs_to_genes_relationships() was expecting a dense ndarray. So I solved this with
scplus_obj.X_EXP=scplus_obj.X_EXP.toarray()
You may want to support sparse matrices for the RNAseq
Hi Decarlin
You are right. At this moment sparse matrices aren't fully supported yet. I'll mark this issue as an enhancement.
Best,
S
Hi @decarlin !
It should work as well, we have this conversion step: https://github.com/aertslab/scenicplus/blob/main/src/scenicplus/TF_to_gene.py [288-297]. Anyways, happy you solved it :)! We will also add an issue to arboreto to directly allow it to accept sparse matrices.
Cheers!
C
Describe the bug In calculate_TFs_to_genes_relationships(), after initialization, throws this error:
Traceback (most recent call last): File "", line 1, in
File "/home/ubuntu/scenicplus/src/scenicplus/TF_to_gene.py", line 333, in calculate_TFs_to_genes_relationships
ex_matrix = pd.DataFrame(
File "/home/ubuntu/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/frame.py", line 737, in init
mgr = ndarray_to_mgr(
File "/home/ubuntu/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 351, in ndarray_to_mgr
_check_values_indices_shape_match(values, index, columns)
File "/home/ubuntu/miniconda3/envs/scenicplus/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 422, in _check_values_indices_shape_match
raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (25945, 1), indices imply (25945, 18892)
To Reproduce Here's the code, starting at a successfully created cistopic object
import itertools import anndata
with open('/data/scenic_demo/output/cistopic_obj.pkl', 'rb') as f: cistopic_obj = pickle.load(f)
import scanpy as sc
adata=sc.read_mtx('/data/CAREHF/multiome_rna_counts.mtx').T adata.obs=atac_metadata cell_data_raw = pd.read_csv('/data/CAREHF/multiome_samples.txt') adata.obs_names =cell_data_raw['x']
gene_names=pd.read_csv('/data/CAREHF/multiome_rna_features.txt') adata.var_names =gene_names['x']
adata=adata.T
import dill
menr = dill.load(open('/data/scenic_demo/carehf/motifs/menr.pkl', 'rb'))
from scenicplus.scenicplus_class import create_SCENICPLUS_object import numpy as np scplus_obj = create_SCENICPLUS_object( GEX_anndata = adata, cisTopic_obj = cistopic_obj, menr = menr, key_to_group_by = 'predicted.celltype_fromRNA', multi_ome_mode = True, bc_transform_func = lambda x: x+'___cisTopic' )
from scenicplus.preprocessing.filtering import *
filter_genes(scplus_obj, min_pct = 0.5) filter_regions(scplus_obj, min_pct = 0.5)
from scenicplus.cistromes import * merge_cistromes(scplus_obj)
from scenicplus.enhancer_to_gene import get_search_space, calculate_regions_to_genes_relationships, GBM_KWARGS from scenicplus.enhancer_to_gene import GBM_KWARGS
get_search_space(scplus_obj, biomart_host = 'http://www.ensembl.org', species = 'hsapiens', assembly = 'hg38', upstream = [1000, 150000], downstream = [1000, 150000])
calculate_regions_to_genes_relationships(scplus_obj, ray_n_cpu = 20, _temp_dir = tmp_dir, importance_scoring_method = 'GBM', importance_scoring_kwargs = GBM_KWARGS)
with open('/data/scenic_demo/carehf/scplus_obj.pkl', 'wb') as f: pickle.dump(scplus_obj, f)
from scenicplus.TF_to_gene import * tf_file = '/data/scenic_demo/allTFs_hg38.txt'
calculate_TFs_to_genes_relationships(scplus_obj, tf_file = tf_file, ray_n_cpu = 20, method = 'GBM', _temp_dir = tmp_dir, key= 'TF2G_adj')
Version (please complete the following information):
Additional context Perhaps this is related to creating the scenic object from a .mtx rather than AnnData object? However, prior to TF to genes inference, the scenic object looks fine:
Anyway, thanks for the work, excited to see the results...