MannLabs / alphapeptstats

Python Package for the downstream analysis of mass-spectrometry-based proteomics data
https://alphapeptstats.readthedocs.io/en/latest/
Apache License 2.0
58 stars 14 forks source link

Cut offs for Volcano Plots greatly shifted from significance #270

Open steph-robinson opened 8 months ago

steph-robinson commented 8 months ago

Cut-offs_VolcanoPlots

Inserting image up top show the discrepancy I'm seeing. I'm employing the stats from this package by applying select formulas to log2(LFQ) by repeatedly calling the perform_ttest_analysis function in the following function.

def alphastats_ttest(data, s0, fdr): 
    dfs = [df.reset_index() for df in data]
    stat_dfs = []
    t_limits = []
    for df in dfs: 
        grp1colnames = df.columns[df.columns.str.contains('WT')].to_list()
        grp2colnames = df.columns[df.columns.str.contains('Test')].to_list()

        stats, tmax = perform_ttest_analysis(
                                            df,  
                                            grp2colnames, 
                                            grp1colnames, 
                                            s0=s0, #Refer to Tusher et al. 2001 for s0 definition. 
                                            n_perm=2,
                                            fdr=fdr, #5% FDR 
                                            id_col="Uniprot",
                                            plot_fdr_line=True,
                                            parallelize=True
        )

        stat_dfs.append(stats)
        t_limits.append(tmax)

    return stat_dfs, t_limits

I then calculate df's from the list of returned t_limits, before plotting and getting the volcanoes attached here. Seems that there is a disconnect between how the ttest cut-off is being calculated in the get_MaxS vs the usual perform_ttest_analysis?

Here's the last portion of my code before plotting.

cut_offs = []

for i, df in enumerate(stats):
    n_x, n_y = len(df), len(df)
    s0 = 1.5

    cut_off = get_fdr_line(t_limits[i],
                        s0, n_x, n_y, plot=False,
                        fc_s=np.arange(0, 10, 0.05),
                        s_s=np.arange(0.005, 10, 0.05))

    cut_off['-logp'] = -np.log(cut_off['pvals']) #transform into -log10 space for plotting
    cut_offs.append(cut_off)