BayraktarLab / cell2location

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
https://cell2location.readthedocs.io/en/latest/
Apache License 2.0
307 stars 57 forks source link

Saving for QC and results plots. #191

Open bednarsky opened 2 years ago

bednarsky commented 2 years ago

Some functions (e.g., cell2location.utils.filtering.filter_genes or RegressionModel.plot_QC) generate plots, but have no way (that I could think of) for saving the plots from a script. Workarounds don't always work, for example the code below only saves one of the two plots generated when interactively running the code.

fig, ax = plt.subplots()
mod.plot_QC()
# only saves first plot unfortunately
fig.savefig(QC_dir / 'QC_plots_signature_model.png', dpi=300)

Since sometimes running cell2loc interactively is not practical, it would be great to have improved plot saving capabilities! Thank you for your great work!

vitkl commented 2 years ago

Thanks for using cell2location @bednarsky! We indeed also often use cell2location non-interactively and in general, save a range of plots. Is mod.plot_QC() and cell2location.utils.filtering.filter_genes the only methods that have this issue?

@yozhikoff what do you think is the best interface for saving plots? File path argument to the function or just making sure that mod.plot_QC() creates only one plot?

vitkl commented 2 years ago

Cell2location plot_QC() should be save-able:

if True:
    mod.plot_QC()
    plt.savefig(f"{scvi_run_name}/reconstruction_accuracy_histogram.png",
                bbox_inches='tight')
    plt.close()
vitkl commented 2 years ago

We also often look at these additional QC metrics:

if True:
    fig_dir = f"{scvi_run_name}/spatial/"
    import os
    if not os.path.exists(fig_dir):
        os.mkdir(fig_dir)
    adata_vis.obs['total_cell_abundance'] = adata_vis.uns['mod']['post_sample_means']['w_sf'].sum(1).flatten()
    fig = plot_spatial_per_cell_type(adata_vis, cell_type='total_cell_abundance', prefix='');
    fig.savefig(f"{fig_dir}total_cell_abundance.png", bbox_inches='tight')
    fig.clear()
    plt.close(fig)

    adata_vis.obs['detection_y_s'] = adata_vis.uns['mod']['post_sample_q05']['detection_y_s']
    fig = plot_spatial_per_cell_type(adata_vis, cell_type='detection_y_s', prefix='');
    fig.savefig(f"{fig_dir}detection_y_s.png", bbox_inches='tight')
    fig.clear()
    plt.close(fig)

    fig = plot_spatial_per_cell_type(adata_vis, cell_type='total_counts', prefix='');
    fig.savefig(f"{fig_dir}total_RNA_counts.png", bbox_inches='tight')
    fig.clear()
    plt.close(fig)

which requires a custom plotter that visualizes the same obs column for all visium sections:

def plot_spatial_per_cell_type(adata, 
                                   cell_type='total_counts',
                                   samples=adata.obs['sample'].cat.categories,
                                  ncol=3, prefix='', figsize=(24, 6), vmax_quantile=0.992):
    n_samples = len(samples)
    nrow = int(np.ceil(n_samples / ncol))
    fig, axs = plt.subplots(nrow, ncol, figsize=figsize)
    if nrow == 1:
        axs = axs.reshape((1, ncol))

    col_name = f'{prefix}{cell_type}'
    vmax = np.quantile(adata.obs[col_name].values, vmax_quantile)
    adata.obs[cell_type] = adata.obs[col_name].copy()

    from itertools import chain
    ind = list(chain(*[[(i, j) for i in range(nrow)] for j in range(ncol)]))

    for i, s in enumerate(samples):
        sp_data_s = select_slide(adata, s, s_col='sample')
        sc.pl.spatial(sp_data_s, cmap='magma',
                      color=cell_type, 
                      size=1.3, img_key='hires', alpha_img=1,
                      vmin=0, vmax=vmax, ax=axs[ind[i][0],ind[i][1]], show=False
                                            )
        axs[ind[i][0],ind[i][1]].title.set_text(cell_type+'\n'+s)

    if len(samples) < len(axs.flatten()):
        for i in range(len(samples), len(axs.flatten())):
            axs[ind[i][0],ind[i][1]].axis('off')

    fig.tight_layout(pad=0.5)

    return fig

plot_spatial_per_cell_type(adata, cell_type='total_counts');
vitkl commented 2 years ago

Also total cell abundance histograms per sample:

if True:
    # look at the total cell abundance for each experiment
    with mpl.rc_context({'axes.facecolor': 'white', 'font.size': 10, 
                         'figure.figsize': [5, 15]}):
        for i, n in enumerate(adata_vis.obs['sample'].unique()):
            plt.subplot(len(adata_vis.obs['sample'].unique()), 1, i+1)
            total_cell_abundance = adata_vis.uns['mod']['post_sample_means']['w_sf'].sum(1).flatten()
            total_cell_abundance_ = total_cell_abundance[adata_vis.obs['sample'] == n]
            plt.hist(total_cell_abundance_, bins=100);
            plt.title(n)
            plt.xlabel(r'Total cell abundance ($sum_f w_sf$) for each experiment')
            plt.xlim(0, total_cell_abundance.max())
        plt.savefig(f"{scvi_run_name}/total_cell_abundance.png",
                    bbox_inches='tight')
        plt.close()
vitkl commented 2 years ago

Finally, this is how you can save plots for A) one cell type in all samples and B) all cell types in one sample:

if True:
    # add 5% quantile, representing confident cell abundance, 'at least this amount is present', 
    # to adata.obs with nice names for plotting
    adata_vis.obs[adata_vis.uns['mod']['factor_names']] = adata_vis.obsm['q05_cell_abundance_w_sf']

    from re import sub
    import os
    for c in adata_vis.uns['mod']['factor_names']:
        fig = plot_spatial_per_cell_type(adata_vis, cell_type=c)
        fig_dir = f"{scvi_run_name}/spatial/"
        if not os.path.exists(fig_dir):
            os.mkdir(fig_dir)
        fig_dir = f"{scvi_run_name}/spatial/per_cell_type/"
        if not os.path.exists(fig_dir):
            os.mkdir(fig_dir)
        fig.savefig(f"{fig_dir}W_cell_abundance_q05_{sub('/', '_', c)}.png",
                    bbox_inches='tight')
        fig.clear()
        plt.close(fig)

    f"{scvi_run_name}/spatial/per_cell_type/"

    with mpl.rc_context({"axes.facecolor": "black"}):
        clust_names = adata_vis.uns['mod']['factor_names']

        for s in adata_vis.obs['sample'].unique():

            s_ind = adata_vis.obs['sample'] == s
            s_keys = list(adata_vis.uns['spatial'].keys())
            s_spatial = np.array(s_keys)[[s in i for i in s_keys]][0]

            fig = sc.pl.spatial(adata_vis[s_ind, :], cmap='magma',
                                color=clust_names, ncols=5, library_id=s_spatial,
                                size=1.3, img_key='hires', alpha_img=1,
                                vmin=0, vmax='p99.2',
                                return_fig=True, show=False)

            fig_dir = f"{scvi_run_name}/spatial/"
            if not os.path.exists(fig_dir):
                os.mkdir(fig_dir)
            fig_dir = f"{scvi_run_name}/spatial/per_sample/"
            if not os.path.exists(fig_dir):
                os.mkdir(fig_dir)

            plt.savefig(f"{fig_dir}W_cell_abundance_q05_{s}.png",
                        bbox_inches='tight')
            plt.close()
vitkl commented 2 years ago

Also adding the history plots for completeness:

# plot ELBO loss history during training, removing first 100 epochs from the plot
mod.plot_history(5000)
plt.savefig(f"{scvi_run_name}/training_ELBO_history_minus5k.png",
                   bbox_inches='tight')
plt.close()
mod.plot_history(0)
plt.savefig(f"{scvi_run_name}/training_ELBO_history_all.png",
                   bbox_inches='tight')
plt.close()
yozhikoff commented 2 years ago

@yozhikoff what do you think is the best interface for saving plots? File path argument to the function or just making sure that mod.plot_QC() creates only one plot?

@vitkl I would just return a list of Figure objects, so that the user could do whatever is needed - save, modify, add something else.

Quentin-Bayard commented 1 year ago

hi ! up :)

sciencepeak commented 11 months ago

@yozhikoff what do you think is the best interface for saving plots? File path argument to the function or just making sure that mod.plot_QC() creates only one plot?

@vitkl I would just return a list of Figure objects, so that the user could do whatever is needed - save, modify, add something else.

I am using the mod.plot_QC(), which produces two graphs simultaneously. I wish I can save those two graphs to the files without interactively manually "save plot as". I wish to run the script once and get all plots saved to files.

meredithramba commented 1 month ago

Cell2location plot_QC() should be save-able:

if True:
    mod.plot_QC()
    plt.savefig(f"{scvi_run_name}/reconstruction_accuracy_histogram.png",
                bbox_inches='tight')
    plt.close()

When I run this, I end up with a blank png, which I believe is because the function mod.plot_QC() has plt.show(), resulting in a blank figure being saved (running plt.savefig after plt.show). Is there a way to save figures with the current functions?

sciencepeak commented 1 month ago

Cell2location plot_QC() should be save-able:

if True:
    mod.plot_QC()
    plt.savefig(f"{scvi_run_name}/reconstruction_accuracy_histogram.png",
                bbox_inches='tight')
    plt.close()

When I run this, I end up with a blank png, which I believe is because the function mod.plot_QC() has plt.show(), resulting in a blank figure being saved (running plt.savefig after plt.show). Is there a way to save figures with the current functions?

Hi,

Inspired by your comments. I found that when I specify show=False in the sc.pl.spatial() function, I can draw the graph to the hard drive. Without the show=False, I just got a blank file on the hard drive, which means I have to manually interactively export or save the graph to the hard drive.

Showing the figures by default is a problem for programming style coding that handles many samples running in the back end of the computer. I still can not figure out the mod.plot_QC(), which lacks the parameter of show=False and draws two figures at the same time.

vitkl commented 1 month ago

Could you make a PR suggesting specific changes?

On Thu, 22 Aug 2024 at 19:06, Science Peak @.***> wrote:

Cell2location plot_QC() should be save-able:

if True: mod.plot_QC() plt.savefig(f"{scvi_run_name}/reconstruction_accuracy_histogram.png", bbox_inches='tight') plt.close()

When I run this, I end up with a blank png, which I believe is because the function mod.plot_QC() has plt.show(), resulting in a blank figure being saved (running plt.savefig after plt.show). Is there a way to save figures with the current functions?

Hi,

Inspired by your comments. I found that when I specify show=False in the sc.pl.spatial() function, I can draw the graph to the hard drive. Without the show=False, I just got a blank file on the hard drive, which means I have to manually interactively export or save the graph to the hard drive.

Showing the figures by default is a problem for programming style coding that handles many samples running in the back end of the computer. I still can not figure out the mod.plot_QC(), which lacks the parameter of show=False and draws two figures at the same time.

— Reply to this email directly, view it on GitHub https://github.com/BayraktarLab/cell2location/issues/191#issuecomment-2305351712, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMFTV4FCYZVIJCZG24XJG3ZSYSDNAVCNFSM6AAAAABMSYIRB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGM2TCNZRGI . You are receiving this because you were mentioned.Message ID: @.***>

sciencepeak commented 1 month ago

Thanks for using cell2location @bednarsky! We indeed also often use cell2location non-interactively and in general, save a range of plots. Is mod.plot_QC() and cell2location.utils.filtering.filter_genes the only methods that have this issue?

@yozhikoff what do you think is the best interface for saving plots? File path argument to the function or just making sure that mod.plot_QC() creates only one plot?

One function creates only one plot would be a better solution. If two plots are needed, perhaps it is good to split the function into two functions with similar names.

Adding an parameter show=False to the function would be a better solution. If you could point out where the source code and documentation of mod.plot_QC() is, I would like to take a look, though my Python skill on cell2location might not be good enough to submit a PR.

If the two "better" solutions are realized at the same time, it would be a perfect solution for needs of programmers, who can not use interactive notebook to handle many samples.