Closed tu4n-p closed 1 year ago
Hi @tu4n-p
Could you show the output of following commands:
scplus_obj.to_df('EXP')
scplus_obj.to_df('ACC')
scplus_obj.to_df('EXP').sum(0)
scplus_obj.to_df('EXP').sum(1)
scplus_obj.to_df('ACC').sum(0)
scplus_obj.to_df('ACC').sum(1)
Best,
Seppe
Here they are, as for rna-seq, I used scaled data. Thank you so much for looking into this, T
Sorry for the late reply.
I think the reason for these NaNs is because your gene expression data contains negative values (it's probably already normalized), inside the function to calculate differentially expressed genes it will be log-normalised, probably resulting in the NaN values.
We expect the gene expression to be raw counts.
It will probably not affect the GRN inference that much though.
You can try again using raw counts or you can skip this step (it's optional), by setting calculate_DEGs_DARs
to False
.
Hope this helps.
Best,
Seppe
Got it. Thank you!
Hi there, I'm using scenicplus for separate scATAC seq and scRNAseq from my lab following the melanoma tutorial. Things went fine for the most part but scenicplus wasn't able to generate DEGs for DEG-DAR with the following message. I went back to the code and figured out that the a lot of values of RNA expression level from scenicplus pseudocell generation became NaN after normalization and log1p transformation and that led to the NaN mean expression which led to aforementioned problem. It'd be great to hear your thoughts on how I should go about this. Should I normalize data in a different way or should I skip the normallization part? Thank you, This is the command I ran and the error message in brief:
This is the indepth error message: 2022-11-21 00:24:30,030 SCENIC+_wrapper INFO /scratch/JEK/scenicplus2 folder already exists. 2022-11-21 00:24:30,030 SCENIC+_wrapper INFO Calculating TF-eGRNs AUC correlation 2022-11-21 00:24:35,229 SCENIC+_wrapper INFO Calculating DEGs/DARs 2022-11-21 00:24:35,230 SCENIC+ INFO Calculating DEGs for variable refined_annotation
ValueError Traceback (most recent call last) Input In [12], in <cell line: 2>() 21 except Exception as e: 22 #in case of failure, still save the object 23 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus2/scplus_obj.pkl'), 'wb'), protocol=-1) ---> 24 raise(e)
Input In [12], in <cell line: 2>() 2 try: 3 sys.stderr = open(os.devnull, "w") # silence stderr ----> 4 run_scenicplus( 5 scplus_obj = scplus_obj, 6 variable = ['refined_annotation'], 7 species = 'dmelanogaster', 8 assembly = 'dm6', 9 tf_file = '/gpfs/home/tpham43/14_16/allTFs_dmel-lite.txt', 10 save_path = os.path.join(work_dir, 'scenicplus2'), 11 biomart_host = biomart_host, 12 upstream = [100, 1500], 13 downstream = [100, 1500], 14 calculate_TF_eGRN_correlation = True, 15 calculate_DEGs_DARs = True, 16 export_to_loom_file = True, 17 export_to_UCSC_file = True, 18 n_cpu = 12, 19 _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
20 sys.stderr = sys.stderr # unsilence stderr 21 except Exception as e: 22 #in case of failure, still save the object
File /gpfs/home/tpham43/scenicplus/src/scenicplus/wrappers/run_scenicplus.py:305, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, **kwargs) 303 log.info('Calculating DEGs/DARs') 304 for var in variable: --> 305 get_differential_features(scplus_obj, var, use_hvg = True, contrast_type = ['DEGs', 'DARs']) 307 if export_to_loom_file is True: 308 log.info('Exporting to loom file')
File /gpfs/home/tpham43/scenicplus/src/scenicplus/diff_features.py:89, in get_differential_features(scplus_obj, variable, use_hvg, contrast_type, adjpval_thr, log2fc_thr, min_cells) 87 sc.pp.log1p(adata) 88 if use_hvg: ---> 89 sc.pp.highly_variable_genes( 90 adata, min_mean=0.0125, max_mean=3, min_disp=min_disp, max_disp=np.inf) 91 var_features = adata.var.highly_variable[adata.var.highly_variable].index.tolist( 92 ) 93 adata = adata[:, var_features]
File ~/.local/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py:432, in highly_variable_genes(adata, layer, n_top_genes, min_disp, max_disp, min_mean, max_mean, span, n_bins, flavor, subset, inplace, batch_key, check_values) 420 return _highly_variable_genes_seurat_v3( 421 adata, 422 layer=layer, (...) 428 inplace=inplace, 429 ) 431 if batch_key is None: --> 432 df = _highly_variable_genes_single_batch( 433 adata, 434 layer=layer, 435 min_disp=min_disp, 436 max_disp=max_disp, 437 min_mean=min_mean, 438 max_mean=max_mean, 439 n_top_genes=n_top_genes, 440 n_bins=n_bins, 441 flavor=flavor, 442 ) 443 else: 444 sanitize_anndata(adata)
File ~/.local/lib/python3.8/site-packages/scanpy/preprocessing/_highly_variable_genes.py:216, in _highly_variable_genes_single_batch(adata, layer, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor) 213 df['dispersions'] = dispersion 214 if flavor == 'seurat': 215 #print(df['means']) --> 216 df['mean_bin'] = pd.cut(df['means'], bins=n_bins) 217 disp_grouped = df.groupby('mean_bin')['dispersions'] 218 disp_mean_bin = disp_grouped.mean()
File ~/.local/lib/python3.8/site-packages/pandas/core/reshape/tile.py:293, in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered) 290 if (np.diff(bins.astype("float64")) < 0).any(): 291 raise ValueError("bins must increase monotonically.") --> 293 fac, bins = _bins_to_cuts( 294 x, 295 bins, 296 right=right, 297 labels=labels, 298 precision=precision, 299 include_lowest=include_lowest, 300 dtype=dtype, 301 duplicates=duplicates, 302 ordered=ordered, 303 ) 305 return _postprocess_for_cut(fac, bins, retbins, dtype, original)
File ~/.local/lib/python3.8/site-packages/pandas/core/reshape/tile.py:420, in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered) 418 if len(unique_bins) < len(bins) and len(bins) != 2: 419 if duplicates == "raise": --> 420 raise ValueError( 421 f"Bin edges must be unique: {repr(bins)}.\n" 422 f"You can drop duplicate edges by setting the 'duplicates' kwarg" 423 ) 424 else: 425 bins = unique_bins
ValueError: Bin edges must be unique: array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]). You can drop duplicate edges by setting the 'duplicates' kwarg
Version (please complete the following information):
Additional context Add any other context about the problem here.