aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

Calculate path matrices for differentiation arrows (branched tracjectory) #312

Open bhhlee opened 4 months ago

bhhlee commented 4 months ago

Hi scenicplus team. Thanks for building such a great package! I've been working on generating on a branched differentation tracjetory analysis, but I can't manage to get the plot_potential() graph to work.

I was following the branched differentation trajectory tutorial:

import scanpy as sc
selected_features = {}
selected_features['TF'] = list(set([x.split('_')[0] for x in adata.var.index.tolist() if 'g)' in x]))
selected_features['Region'] = [x for x in adata.var.index.tolist() if 'r)' in x]
selected_features['Gene'] = [x for x in adata.var.index.tolist() if 'g)' in x]
## Calculate
from scenicplus.differentiation_potential import *
paths_cascade = {'TF':{}, 'Region':{}, 'Gene':{}}
for x in paths_cascade.keys():
    for ipath, (descr, path) in enumerate(paths):
        mat= get_path_matrix(
                adata,
                dpt_var = 'distance',
                path_var = 'GEX_celltype',
                path = path,
                features = selected_features[x],
                split_groups = True)
        paths_cascade[x][descr] = mat

But when I try to plot a particular TF that I know is in the dataset and in the selected_features['TF'] list, the plot won't work.

fig, ax = plt.subplots(figsize =(12, 4))
plt.subplot(1, 2, 1)
df_1 = plot_potential(adata, paths_cascade, 'CD4', 'SREBF2', window=1, gam_smooth=True, return_data = True)
plt.subplot(1, 2, 2)
df_2 = plot_potential(adata, paths_cascade, 'CD8', 'SREBF2', window=1, gam_smooth=True, return_data = True)
plt.tight_layout()
plt.savefig(outDir+'curve_plot.pdf')

image

From my understanding during generation of the paths_cascade I should get a dictionary of distance values per TF/Region/Gene per path, but for some reason the paths_cascade only has one distance column in paths_cascade['TF']. image image

Any idea why this would be? Or maybe I'm doing something else wrong?

Thanks! Brian

Umaarasu commented 3 months ago

@bhhlee Hi, have you sorted this issue? @SeppeDeWinter ..can you please help with this issue?

Code: fig, ax = plt.subplots(figsize =(12, 4)) plt.subplot(1, 2, 1) df_1 = plot_potential(adata, pathscascade, 'Adipo', 'ABL1+_+', window=1, gam_smooth=True, return_data = True) plt.tight_layout() plt.savefig(outDir+'curve_plots/lz.pdf')

I have this error and also as similar to your issue the TF column has only one column in TF /tmp/ipykernel_7750/1746384550.py:5: MatplotlibDeprecationWarning: Auto-removal of overlapping axes is deprecated since 3.6 and will be removed two minor releases later; explicitly call ax.remove() as needed. plt.subplot(1, 2, 1)


IndexError Traceback (most recent call last) Cell In[40], line 6 4 fig, ax = plt.subplots(figsize =(12, 4)) 5 plt.subplot(1, 2, 1) ----> 6 df_1 = plot_potential(adata, paths_cascade, 'Adipo', 'lz', window=1, gam_smooth=True, return_data = True) 7 plt.tight_layout() 8 plt.savefig(outDir+'curve_plots/lz.pdf')

File ~/scenicplus/src/scenicplus/differentiation_potential.py:182, in plot_potential(adata, paths_cascades, path, tf, window, show_plot, return_data, gam_smooth, dpt_var, use_ranked_dpt) 179 if len(region_regulon_name) > 1: 180 region_regulon_name = [x for x in region_regulon_name if 'extended' not in x] --> 182 gene = gene_data[gene_regulon_name[0]] 183 region = region_data[region_regulonname[0]] 184 if '' in tf:

IndexError: list index out of range

bhhlee commented 3 months ago

Hi @SeppeDeWinter and @Umaarasu

I found out the issue. When you concat the adata and TF expression you make adata_all, but you never assign it back to adata. You can just continue with adata_all throughout the rest of the tutorial or you can just add adata = adata_all

import pandas as pd
import anndata
import sklearn
adata_gene = anndata.AnnData(X=scplus_obj.to_df('EXP').copy())
sc.pp.normalize_total(adata_gene, target_sum=1e4)
sc.pp.log1p(adata_gene)
import scanpy as sc, anndata as ad, numpy as np, pandas as pd
adata_all = ad.concat([adata, adata_gene], axis=1, merge='unique', uns_merge='unique').copy()
adata = adata_all

Turns out that this is in the unbranched trajectory tutorial, but the adata = adata_all is just missing from the branched trajectory tutorial.

Hope this helps.

Best, Brian

Umaarasu commented 3 months ago

Hi Brian @bhhlee ....thanks..I had infact performed that step but somewhere down the pipeline which caused it to be ineffective. I have now rectified that. I would also like to ask a couple of more question in the upcoming steps.

1) df = cell_forces(adata, paths_cascade, plot_type='tf_to_gene', window=1, gam_smooth=True, tf_traj_thr=0.7, tf_expr_thr=0.2, selected_eGRNs=selected_eGRNs, n_cpu=20, _temp_dir='/scratch/leuven/313/vsc31305/ray_spill') I am getting a empty df. There is a key error which says some indexes are missing. Did you have any issues here?

2) How did you go about this step?

We will keep only high quality regulons

import pandas as pd subset = pd.read_csv('/staging/leuven/stg_00002/lcb/cbravo/Multiomics_pipeline/analysis/EAD_3/atac/scenicplus_v2/HQ_regulons_onlyeye.tsv').iloc[:,0].tolist() subset = [x.split('')[0] for x in subset] s = [x.split('_(')[0] for x in selectedfeatures['Gene'] if '++' in x and x.split('')[0] in subset and x.split('(')[0] in df.columns and adata.var['Score'].loc[x.split('_')[0]] > 30]......Did you have a tsv file with high regulons? If so where did you get that?