aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

Problem with GRN plot #393

Open SteveTur opened 6 months ago

SteveTur commented 6 months ago

Hello,

It seems that the function to plot the GRN on python doesn't work anymore since the recent updates:

from scenicplus.networks import create_nx_tables, create_nx_graph, plot_networkx, export_to_cytoscape G, pos, edge_tables, node_tables = create_nx_graph( nx_tables, use_edge_tables = ['TF2R','R2G'], color_edge_by = { 'TF2R': {'variable' : 'TF', 'category_color' : category_color}, 'R2G': {'variable' : 'R2G_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1} }, transparency_edge_by = { 'R2G': {'variable' : 'R2G_importance', 'min_alpha': 0.1, 'v_min': 0} }, width_edge_by = { 'R2G': {'variable' : 'R2G_importance', 'max_size' : 1.5, 'min_size' : 1} }, color_node_by = { 'TF': {'variable': 'TF', 'category_color' : category_color}, 'Gene': {'variable': 'GEX_celltype_Log2FC_6', 'continuous_color' : 'bwr'}, 'Region': {'variable': 'GEX_celltype_Log2FC_6', 'continuous_color' : 'viridis'} }, transparency_node_by = { 'Region': {'variable' : 'GEX_celltype_Log2FC_6', 'min_alpha': 0.1}, 'Gene': {'variable' : 'GEX_celltype_Log2FC_6', 'min_alpha': 0.1} }, size_node_by = { 'TF': {'variable': 'fixed_size', 'fixed_size': 30}, 'Gene': {'variable': 'fixed_size', 'fixed_size': 15}, 'Region': {'variable': 'fixed_size', 'fixed_size': 10} }, shape_node_by = { 'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'}, 'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'}, 'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'} }, label_size_by = { 'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0}, 'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0}, 'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0} }, layout='kamada_kawai_layout', scale_position_by=250 )

plt.figure(figsize=(50,50)) plot_networkx(G, pos)

Only blank output are coming out for some reason. We never got any problem before the recent changes. What should we do?

Thank you for your help,

Best,

Steven

my0916 commented 6 months ago

Hi,

I'm also suffering from a similar problem... https://github.com/aertslab/scenicplus/discussions/378#discussioncomment-9314238

I also tried to export to Cytoscape with the export_to_cytoscape function, but the exported cys file cannot be loaded with the message 'Cannot find the cysession.xml file'

Thank you for your help.

Best, Masa

yiyelinfeng commented 6 months ago

I have same problem, please help!

SeppeDeWinter commented 6 months ago

Hi @SteveTur, @my0916 and @yiyelinfeng

I am aware of the issue. I have added it to my to do list and will try to fix it whenever I have some time.

All the best,

Seppe

yiyelinfeng commented 6 months ago

Hi@ SeppeDeWinter,

my issue only just the .cys file cannot be loaded in cytoscape with the message 'Cannot find the cysession.xml file' when file -> import -> Network from file ... after import the SNENIC+ network layout. Because I want to check and change the network graph using Cytoscape. but can't load it.

thanks,

Best, Lin

DmitriiSeverinov commented 5 months ago

Hi @SeppeDeWinter , @yiyelinfeng @my0916 , @SteveTur ,

I had similar issue with blank output, what I found is that if I comment the following line in the create_nx_tables function, it works: subset_eRegulons = [x + '_[^a-zA-Z0-9]' for x in subset_eRegulons]

So, here is my code how I did it:

from pycisTopic.diff_features import find_highly_variable_features

scplus_obj.uns['direct_e_regulon_metadata_filtered'] = scplus_mdata.uns['direct_e_regulon_metadata_filtered']

hvr = find_highly_variable_features(scplus_obj.to_df('ACC').loc[list(set(scplus_obj.uns['direct_e_regulon_metadata_filtered']['Region']))], n_top_features=3000, plot = False)
hvg = find_highly_variable_features(scplus_obj.to_df('EXP')[list(set(scplus_obj.uns['direct_e_regulon_metadata_filtered']['Gene']))].T, n_top_features=3000, plot = False)

from scenicplus.networks import create_nx_tables, create_nx_graph, plot_networkx, export_to_cytoscape

nx_tables = list()

def _format_df_nx(df, key, var):
    """
    A helper function to format differential test results
    """
    df.index = df['names']
    df = pd.DataFrame(df['logfoldchanges'])
    df.columns = [var+'_Log2FC_'+key]
    df.index.name = None
    return df

def _get_log2fc_nx(scplus_obj: 'SCENICPLUS',
                  variable,
                  features,
                  contrast: Optional[str] = 'gene'
                  ):
    """
    A helper function to derive log2fc changes
    """
    if contrast == 'gene':
        adata = anndata.AnnData(X=scplus_obj.X_EXP, obs=pd.DataFrame(
            index=scplus_obj.cell_names), var=pd.DataFrame(index=scplus_obj.gene_names))
    if contrast == 'region':
        adata = anndata.AnnData(X=scplus_obj.X_ACC.T, obs=pd.DataFrame(
            index=scplus_obj.cell_names), var=pd.DataFrame(index=scplus_obj.region_names))
    adata.obs = pd.DataFrame(scplus_obj.metadata_cell[variable])
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    adata = adata[:, features]
    sc.tl.rank_genes_groups(
        adata, variable, method='wilcoxon', corr_method='bonferroni')
    groups = adata.uns['rank_genes_groups']['names'].dtype.names
    diff_list = [_format_df_nx(sc.get.rank_genes_groups_df(
        adata, group=group), group, variable) for group in groups]
    return pd.concat(diff_list, axis=1)

def create_nx_tables(scplus_obj: 'SCENICPLUS',
                     eRegulon_metadata_key: str ='eRegulon_metadata',
                     subset_eRegulons: List = None,
                     subset_regions: List = None,
                     subset_genes: List = None,
                     add_differential_gene_expression: bool = False,
                     add_differential_region_accessibility: bool = False,
                     differential_variable: List =[]):
    """
    A function to format eRegulon data into tables for plotting eGRNs.

    Parameters
    ---------
    scplus_obj: SCENICPLUS
        A SCENICPLUS object with eRegulons
    eRegulon_metadata_key: str, optional
        Key where the eRegulon metadata dataframe is stored
    subset_eRegulons: list, optional
        List of eRegulons to subset
    subset_regions: list, optional
        List of regions to subset
    subset_genes: list, optional
        List of genes to subset
    add_differential_gene_expression: bool, optional
        Whether to calculate differential gene expression logFC for a given variable
    add_differential_region_accessibility: bool, optional
        Whether to calculate differential region accessibility logFC for a given variable
    differential_variable: list, optional
        Variable to calculate differential gene expression or region accessibility.

    Return
    ---------
    A dictionary with edge feature tables ('TF2G', 'TF2R', 'R2G') and node feature tables ('TF', 'Gene', 'Region')
    """
    er_metadata = scplus_obj.uns[eRegulon_metadata_key].copy()
    if subset_eRegulons is not None:
        # subset_eRegulons = [x + '_[^a-zA-Z0-9]' for x in subset_eRegulons]
        er_metadata = er_metadata[er_metadata['Region_signature_name'].str.contains(
            '|'.join(subset_eRegulons))]
    if subset_regions is not None:
        er_metadata = er_metadata[er_metadata['Region'].isin(subset_regions)]
    if subset_genes is not None:
        er_metadata = er_metadata[er_metadata['Gene'].isin(subset_genes)]
    nx_tables = {}
    nx_tables['Edge'] = {}
    nx_tables['Node'] = {}
    # Generate edge tables
    r2g_columns = [x for x in er_metadata.columns if 'R2G' in x]
    tf2g_columns = [x for x in er_metadata.columns if 'TF2G' in x]
    nx_tables['Edge']['TF2R'] = er_metadata[er_metadata.columns.difference(
        r2g_columns + tf2g_columns)].drop('Gene', axis=1).drop_duplicates()
    nx_tables['Edge']['TF2R'] = nx_tables['Edge']['TF2R'][['TF', 'Region'] +
                                                          nx_tables['Edge']['TF2R'].columns.difference(['TF', 'Region']).tolist()]
    nx_tables['Edge']['R2G'] = er_metadata[er_metadata.columns.difference(
        tf2g_columns)].drop('TF', axis=1).drop_duplicates()
    nx_tables['Edge']['R2G'] = nx_tables['Edge']['R2G'][['Region', 'Gene'] +
                                                        nx_tables['Edge']['R2G'].columns.difference(['Region', 'Gene']).tolist()]
    nx_tables['Edge']['TF2G'] = er_metadata[er_metadata.columns.difference(
        r2g_columns)].drop('Region', axis=1).drop_duplicates()
    nx_tables['Edge']['TF2G'] = nx_tables['Edge']['TF2G'][['TF', 'Gene'] +
                                                          nx_tables['Edge']['TF2G'].columns.difference(['TF', 'Gene']).tolist()]
    # Generate node tables
    tfs = list(set(er_metadata['TF']))
    nx_tables['Node']['TF'] = pd.DataFrame(
        'TF', index=tfs, columns=['Node_type'])
    nx_tables['Node']['TF']['TF'] = tfs
    genes = list(set(er_metadata['Gene']))
    genes = [x for x in genes if x not in tfs]
    nx_tables['Node']['Gene'] = pd.DataFrame(
        'Gene', index=genes, columns=['Node_type'])
    nx_tables['Node']['Gene']['Gene'] = genes
    regions = list(set(er_metadata['Region']))
    nx_tables['Node']['Region'] = pd.DataFrame(
        'Region', index=regions, columns=['Node_type'])
    nx_tables['Node']['Region']['Region'] = regions
    # Add gene logFC
    if add_differential_gene_expression is True:
        for var in differential_variable:
            nx_tables['Node']['TF'] = pd.concat([nx_tables['Node']['TF'], _get_log2fc_nx(
                scplus_obj, var, nx_tables['Node']['TF'].index.tolist(), contrast='gene')], axis=1)
            nx_tables['Node']['Gene'] = pd.concat([nx_tables['Node']['Gene'], _get_log2fc_nx(
                scplus_obj, var, nx_tables['Node']['Gene'].index.tolist(), contrast='gene')], axis=1)
    if add_differential_region_accessibility is True:
        for var in differential_variable:
            nx_tables['Node']['Region'] = pd.concat([nx_tables['Node']['Region'], _get_log2fc_nx(
                scplus_obj, var, nx_tables['Node']['Region'].index.tolist(), contrast='region')], axis=1)
    return nx_tables

nx_table = create_nx_tables(
    scplus_obj = scplus_obj,
    eRegulon_metadata_key ='direct_e_regulon_metadata_filtered',
    subset_eRegulons = ['atf3', 'bach2b'],
    subset_regions = hvr,
    subset_genes = hvg,
    add_differential_gene_expression = True,
    add_differential_region_accessibility = True,
    differential_variable = ['ident']
)

G, pos, edge_tables, node_tables = create_nx_graph(nx_table, 
                   use_edge_tables = ['TF2R','R2G'],
                   color_edge_by = {'TF2R': {'variable' : 'TF', 'category_color' : {'atf3': 'Red', 'bach2b': 'Blue'}},
                                    'R2G': {'variable' : 'importance_x_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1}},
                   transparency_edge_by =  {'R2G': {'variable' : 'importance_R2G', 'min_alpha': 0.1, 'v_min': 0}},
                   width_edge_by = {'R2G': {'variable' : 'importance_R2G', 'max_size' :  1.5, 'min_size' : 1}},
                   color_node_by = {'TF': {'variable': 'TF', 'category_color' : {'atf3': 'Red', 'bach2b': 'Blue'}},
                                    'Gene': {'variable': 'ident_Log2FC_RSNs', 'continuous_color' : 'bwr'},
                                    'Region': {'variable': 'ident_Log2FC_RSNs', 'continuous_color' : 'viridis'}},
                   transparency_node_by =  {'Region': {'variable' : 'ident_Log2FC_RSNs', 'min_alpha': 0.1},
                                    'Gene': {'variable' : 'ident_Log2FC_RSNs', 'min_alpha': 0.1}},
                   size_node_by = {'TF': {'variable': 'fixed_size', 'fixed_size': 30},
                                    'Gene': {'variable': 'fixed_size', 'fixed_size': 15},
                                    'Region': {'variable': 'fixed_size', 'fixed_size': 10}},
                   shape_node_by = {'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
                                    'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
                                    'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'}},
                   label_size_by = {'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 20.0},
                                    'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0},
                                    'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0}},
                   layout='kamada_kawai_layout',
                   scale_position_by=250)

plt.figure(figsize=(20,20))
plot_networkx(G, pos)
plt.savefig('figures/TFs_target_genes/eRegulons.pdf')
export_to_cytoscape(G, pos, out_file = os.path.join('figures/network.cys'))

I hope it will work for you as well!

However, I didn't manage to solve the problem with import to cytoscape :(

Best, Dmitrii

yiyelinfeng commented 5 months ago

just change "export_to_cytoscape(G, pos, out_file = os.path.join('figures/network.cyjs'))", it works.

degrainger commented 4 months ago

Hi @SeppeDeWinter ,

Are you able to provide any update on this please? I am facing the same issue of create_nx_tables giving empty dataframes. I am also running into errors with @DmitriiSeverinov's code :/ Thanks!

Best, Dave

SeppeDeWinter commented 3 months ago

Hi @degrainger

I did not find the time yet to update this function, I'm sorry!

Best,

Seppe

degrainger commented 3 months ago

Not to worry, we can see you're a busy guy! 😄 I troubleshooted my way through the above solution and got it working in the end. Thanks for all your hard work! -Dave

SteveTur commented 3 months ago

Hi @SeppeDeWinter,

First of all, Seppe, no worries and thank you so much for all of the wonderful tools and all your help!

Hi @degrainger,

Could you share with us how you managed to troubleshoot this? Which version of SCENIC are you using to make it work? Is it the alpha version?

Thank you for your help!

Best,

Steven

degrainger commented 3 months ago

Hi @SteveTur,

Sure, I can share what I did but it mostly came from @DmitriiSeverinov's answer above.

I ran SCENIC+ through the snakemake pipeline using version 1.0a1. I converted my mudata to scenicobject:

scplus_obj = mudata_to_scenicplus(
    mdata = scplus_mdata,
    path_to_cistarget_h5 = "ctx_results.hdf5",
    path_to_dem_h5 = "dem_results.hdf5"
)

Then in Dmitrii's answer, they have a slot called "direct_e_regulon_metadata_filtered" in their scenicplus object. I didn't have this so I made it manually by filtering for the TFs I was intererested in:

direct_md = scplus_obj.uns['eRegulon_metadata']
direct_filt = direct_md[direct_md['TF'].isin(['Etv2', 'Ebf1', 'Ebf2','Lmo2','Tal1','E2f6','Yy1'])]
scplus_obj.uns['direct_e_regulon_metadata_filtered'] = direct_filt

I then ran Dmitrii's code but with a couple changes:

  1. One line commented out(scplus_obj.uns['direct_e_regulon_metadata_filtered'] = scplus_mdata.uns['direct_e_regulon_metadata_filtered'])
  2. Export of network changed to .cyjs as in reply above.

Make sure all your TFs are defined in the dictionaries and that all the variables match the same name as in your scplus_obj.uns columns as this was different in mine versus Dmitrii's

from pycisTopic.diff_features import find_highly_variable_features

#scplus_obj.uns['direct_e_regulon_metadata_filtered'] = scplus_mdata.uns['direct_e_regulon_metadata_filtered']

hvr = find_highly_variable_features(scplus_obj.to_df('ACC').loc[list(set(scplus_obj.uns['direct_e_regulon_metadata_filtered']['Region']))], n_top_features=1000, plot = False)
hvg = find_highly_variable_features(scplus_obj.to_df('EXP')[list(set(scplus_obj.uns['direct_e_regulon_metadata_filtered']['Gene']))].T, n_top_features=1000, plot = False)

from scenicplus.networks import create_nx_tables, create_nx_graph, plot_networkx, export_to_cytoscape

nx_tables = list()

def _format_df_nx(df, key, var):
    """
    A helper function to format differential test results
    """
    df.index = df['names']
    df = pd.DataFrame(df['logfoldchanges'])
    df.columns = [var+'_Log2FC_'+key]
    df.index.name = None
    return df

def _get_log2fc_nx(scplus_obj: 'SCENICPLUS',
                  variable,
                  features,
                  contrast: Optional[str] = 'gene'
                  ):
    """
    A helper function to derive log2fc changes
    """
    if contrast == 'gene':
        adata = anndata.AnnData(X=scplus_obj.X_EXP, obs=pd.DataFrame(
            index=scplus_obj.cell_names), var=pd.DataFrame(index=scplus_obj.gene_names))
    if contrast == 'region':
        adata = anndata.AnnData(X=scplus_obj.X_ACC.T, obs=pd.DataFrame(
            index=scplus_obj.cell_names), var=pd.DataFrame(index=scplus_obj.region_names))
    adata.obs = pd.DataFrame(scplus_obj.metadata_cell[variable])
    sc.pp.normalize_total(adata, target_sum=1e4)
    sc.pp.log1p(adata)
    adata = adata[:, features]
    sc.tl.rank_genes_groups(
        adata, variable, method='wilcoxon', corr_method='bonferroni')
    groups = adata.uns['rank_genes_groups']['names'].dtype.names
    diff_list = [_format_df_nx(sc.get.rank_genes_groups_df(
        adata, group=group), group, variable) for group in groups]
    return pd.concat(diff_list, axis=1)

def create_nx_tables(scplus_obj: 'SCENICPLUS',
                     eRegulon_metadata_key: str ='eRegulon_metadata',
                     subset_eRegulons: List = None,
                     subset_regions: List = None,
                     subset_genes: List = None,
                     add_differential_gene_expression: bool = False,
                     add_differential_region_accessibility: bool = False,
                     differential_variable: List =[]):
    """
    A function to format eRegulon data into tables for plotting eGRNs.

    Parameters
    ---------
    scplus_obj: SCENICPLUS
        A SCENICPLUS object with eRegulons
    eRegulon_metadata_key: str, optional
        Key where the eRegulon metadata dataframe is stored
    subset_eRegulons: list, optional
        List of eRegulons to subset
    subset_regions: list, optional
        List of regions to subset
    subset_genes: list, optional
        List of genes to subset
    add_differential_gene_expression: bool, optional
        Whether to calculate differential gene expression logFC for a given variable
    add_differential_region_accessibility: bool, optional
        Whether to calculate differential region accessibility logFC for a given variable
    differential_variable: list, optional
        Variable to calculate differential gene expression or region accessibility.

    Return
    ---------
    A dictionary with edge feature tables ('TF2G', 'TF2R', 'R2G') and node feature tables ('TF', 'Gene', 'Region')
    """
    er_metadata = scplus_obj.uns[eRegulon_metadata_key].copy()
    if subset_eRegulons is not None:
        # subset_eRegulons = [x + '_[^a-zA-Z0-9]' for x in subset_eRegulons]
        er_metadata = er_metadata[er_metadata['Region_signature_name'].str.contains(
            '|'.join(subset_eRegulons))]
    if subset_regions is not None:
        er_metadata = er_metadata[er_metadata['Region'].isin(subset_regions)]
    if subset_genes is not None:
        er_metadata = er_metadata[er_metadata['Gene'].isin(subset_genes)]
    nx_tables = {}
    nx_tables['Edge'] = {}
    nx_tables['Node'] = {}
    # Generate edge tables
    r2g_columns = [x for x in er_metadata.columns if 'R2G' in x]
    tf2g_columns = [x for x in er_metadata.columns if 'TF2G' in x]
    nx_tables['Edge']['TF2R'] = er_metadata[er_metadata.columns.difference(
        r2g_columns + tf2g_columns)].drop('Gene', axis=1).drop_duplicates()
    nx_tables['Edge']['TF2R'] = nx_tables['Edge']['TF2R'][['TF', 'Region'] +
                                                          nx_tables['Edge']['TF2R'].columns.difference(['TF', 'Region']).tolist()]
    nx_tables['Edge']['R2G'] = er_metadata[er_metadata.columns.difference(
        tf2g_columns)].drop('TF', axis=1).drop_duplicates()
    nx_tables['Edge']['R2G'] = nx_tables['Edge']['R2G'][['Region', 'Gene'] +
                                                        nx_tables['Edge']['R2G'].columns.difference(['Region', 'Gene']).tolist()]
    nx_tables['Edge']['TF2G'] = er_metadata[er_metadata.columns.difference(
        r2g_columns)].drop('Region', axis=1).drop_duplicates()
    nx_tables['Edge']['TF2G'] = nx_tables['Edge']['TF2G'][['TF', 'Gene'] +
                                                          nx_tables['Edge']['TF2G'].columns.difference(['TF', 'Gene']).tolist()]
    # Generate node tables
    tfs = list(set(er_metadata['TF']))
    nx_tables['Node']['TF'] = pd.DataFrame(
        'TF', index=tfs, columns=['Node_type'])
    nx_tables['Node']['TF']['TF'] = tfs
    genes = list(set(er_metadata['Gene']))
    genes = [x for x in genes if x not in tfs]
    nx_tables['Node']['Gene'] = pd.DataFrame(
        'Gene', index=genes, columns=['Node_type'])
    nx_tables['Node']['Gene']['Gene'] = genes
    regions = list(set(er_metadata['Region']))
    nx_tables['Node']['Region'] = pd.DataFrame(
        'Region', index=regions, columns=['Node_type'])
    nx_tables['Node']['Region']['Region'] = regions
    # Add gene logFC
    if add_differential_gene_expression is True:
        for var in differential_variable:
            nx_tables['Node']['TF'] = pd.concat([nx_tables['Node']['TF'], _get_log2fc_nx(
                scplus_obj, var, nx_tables['Node']['TF'].index.tolist(), contrast='gene')], axis=1)
            nx_tables['Node']['Gene'] = pd.concat([nx_tables['Node']['Gene'], _get_log2fc_nx(
                scplus_obj, var, nx_tables['Node']['Gene'].index.tolist(), contrast='gene')], axis=1)
    if add_differential_region_accessibility is True:
        for var in differential_variable:
            nx_tables['Node']['Region'] = pd.concat([nx_tables['Node']['Region'], _get_log2fc_nx(
                scplus_obj, var, nx_tables['Node']['Region'].index.tolist(), contrast='region')], axis=1)
    return nx_tables

nx_table = create_nx_tables(
    scplus_obj = scplus_obj,
    eRegulon_metadata_key ='direct_e_regulon_metadata_filtered',
    subset_eRegulons = ['Etv2', 'Ebf1', 'Ebf2','Lmo2','Tal1','E2f6','Yy1'],
    subset_regions = hvr,
    subset_genes = hvg,
    add_differential_gene_expression = True,
    add_differential_region_accessibility = True,
    differential_variable = ['celltypeNew']
)

G, pos, edge_tables, node_tables = create_nx_graph(nx_table, 
                   use_edge_tables = ['TF2R','R2G'],
                   color_edge_by = {'TF2R': {'variable' : 'TF', 'category_color' : {'Etv2': 'Purple', 'Ebf1': 'Red', 'Ebf2' : 'Orange','Lmo2' : 'Green','Tal1' :'Blue','E2f6':'Cyan','Yy1':'Grey'}},
                                    'R2G': {'variable' : 'importance_x_rho', 'continuous_color' : 'viridis', 'v_min': -1, 'v_max': 1}},
                   transparency_edge_by =  {'R2G': {'variable' : 'importance_R2G', 'min_alpha': 0.1, 'v_min': 0}},
                   width_edge_by = {'R2G': {'variable' : 'importance_R2G', 'max_size' :  1.5, 'min_size' : 1}},
                   color_node_by = {'TF': {'variable': 'TF', 'category_color' : {'Etv2': 'Purple', 'Ebf1': 'Red', 'Ebf2' : 'Orange','Lmo2' : 'Green','Tal1' :'Blue','E2f6':'Cyan','Yy1':'Grey'}},
                                    'Gene': {'variable': 'celltypeNew_Log2FC_early_angioblast', 'continuous_color' : 'bwr'},
                                    'Region': {'variable': 'celltypeNew_Log2FC_early_angioblast', 'continuous_color' : 'viridis'}},
                   transparency_node_by =  {'Region': {'variable' : 'celltypeNew_Log2FC_early_angioblast', 'min_alpha': 0.1},
                                    'Gene': {'variable' : 'celltypeNew_Log2FC_early_angioblast', 'min_alpha': 0.1}},
                   size_node_by = {'TF': {'variable': 'fixed_size', 'fixed_size': 30},
                                    'Gene': {'variable': 'fixed_size', 'fixed_size': 15},
                                    'Region': {'variable': 'fixed_size', 'fixed_size': 10}},
                   shape_node_by = {'TF': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
                                    'Gene': {'variable': 'fixed_shape', 'fixed_shape': 'ellipse'},
                                    'Region': {'variable': 'fixed_shape', 'fixed_shape': 'diamond'}},
                   label_size_by = {'TF': {'variable': 'fixed_label_size', 'fixed_label_size': 20.0},
                                    'Gene': {'variable': 'fixed_label_size', 'fixed_label_size': 10.0},
                                    'Region': {'variable': 'fixed_label_size', 'fixed_label_size': 0.0}},
                   layout='kamada_kawai_layout',
                   scale_position_by=250)

plt.figure(figsize=(20,20))
plot_networkx(G, pos)
plt.savefig('Test2.pdf')
export_to_cytoscape(G, pos, out_file = os.path.join('Test2_network.cyjs'))

Let me know how you get on or if you get errors - I may have encountered them in my troubleshooting. Good luck! -Dave

SteveTur commented 3 months ago

Hi Dave,

Thank you so much for this update! I will let you know if it works for me. :)

Best,

Steven

SteveTur commented 3 months ago

Hi @SeppeDeWinter, @degrainger, @DmitriiSeverinov,

I managed to make it work too! However, I am still confused with some parameters in this function. How do you choose the HVR (highly variable regions) and HVG (highly variable genes) clearly? The shape and number of connections change a lot depending on those selections.

Here's how I've been approaching it: I look at the specificity score of each cell type and the correlation heatmap. Then, for each cell, I plot the network of the TFs involved in the same cell type depending on SS and heatmap. However, it's not clear if you can specify the cell type when plotting the network, and how exactly the HVR/HVG parameters work. For example, if I have an eRegulon gene-based with eRegulon+/+(120g), does the HVG parameter influence those 120 genes? Is it the same for regions?

Another point is when you plot the network of your chosen TF, you don't specify whether it is extended or direct. Does it consider both if you have both for each TF?

If you can share how you deal with these issues, it would be very helpful!

Best regards, Steven

degrainger commented 3 months ago

Hi @SeppeDeWinter, @degrainger, @DmitriiSeverinov,

I managed to make it work too! However, I am still confused with some parameters in this function. How do you choose the HVR (highly variable regions) and HVG (highly variable genes) clearly? The shape and number of connections change a lot depending on those selections.

Here's how I've been approaching it: I look at the specificity score of each cell type and the correlation heatmap. Then, for each cell, I plot the network of the TFs involved in the same cell type depending on SS and heatmap. However, it's not clear if you can specify the cell type when plotting the network, and how exactly the HVR/HVG parameters work. For example, if I have an eRegulon gene-based with eRegulon+/+(120g), does the HVG parameter influence those 120 genes? Is it the same for regions?

Another point is when you plot the network of your chosen TF, you don't specify whether it is extended or direct. Does it consider both if you have both for each TF?

If you can share how you deal with these issues, it would be very helpful!

Best regards, Steven

Hey @SteveTur,

Congrats, glad you got it working. I also considered similar things when performing my analysis. Firstly, I used the highly variable genes / regions to show the most variable genes are regulated by TFs known to be important in my field of study. My data are currently limited to clusters of cells along a single differentiation path and so I wanted to make a figure demonstrating known TFs regulate many of the most variable features along this process.

However, when wanting to look at potentially novel regulators, and their target regions/genes, I thought it would be better to select regions and genes differently. Instead, I selected the top 30-50 genes and regions as ranked by triplet score for each regulon. I thought this would give a better representation of the GRN.

Let me know what you think about that, happy to share the code if you're interested.

Dave

SteveTur commented 3 months ago

Hi @degrainger,

Thank you for this advice, it seems to be a good start. I am gonna try this first to see how the results look!

I will be happy to see the code you wrote to do this analysis!

Thank you so much for your help :)

Best,

Steven

ccruizm commented 2 months ago

Hello @degrainger, Your solution (alongside what @DmitriiSeverinov mentioned) also worked for me! Could you please share the code you mentioned here? I would also like to have a better idea of how to prioritize the TF when building the GRNs. Thanks in advance!

Firstly, I used the highly variable genes / regions to show the most variable genes are regulated by TFs known to be important in my field of study. My data are currently limited to clusters of cells along a single differentiation path and so I wanted to make a figure demonstrating known TFs regulate many of the most variable features along this process. However, when wanting to look at potentially novel regulators, and their target regions/genes, I thought it would be better to select regions and genes differently. Instead, I selected the top 30-50 genes and regions as ranked by triplet score for each regulon. I thought this would give a better representation of the GRN.