Vitek-Lab / MSstats

R package - MSstats
74 stars 46 forks source link

Question regarding pvalues vs. adj.pvalues #95

Closed liorlobel closed 3 months ago

liorlobel commented 1 year ago

Dear Vitek Lab,

Thanks for developing MSstats.

I have a question regarding adj.pvalues. It might be my own confusion, but I see some nicely distributed pvalues (see self-generated volcano plot below), but when I'm graphing the adj.pvalues I get weird results - see both my self-generated and MSstats-generated volcano plots.

One thing that is confusing is that when sorting the adj.pvalues I see a lot of 0's, while the pvalues have normal non-zero values.

I will greatly appreciate any advice,

Best,

Lior

P values image

Adj.pvalues image

MSstats - groupComparisonPlots image

liorlobel commented 1 year ago

When I adjust the pvalues using the p.adjust function, I get non-zero values. The top, on-axis, points disappear.

mstaniak commented 1 year ago

Hi, thanks for reporting the problem, can you provide a subset of data to reproduce the problem? (Data can be anonymized by changing protein, peptide, condition labels if needed, only the structure matters)

liorlobel commented 1 year ago

Thanks! Will try

On 3 Apr 2023, at 19:42, Mateusz Staniak @.***> wrote:

Hi,

this update should fix the problem: https://github.com/Vitek-Lab/MSstatsConvert/tree/hotfix-diann-na it will make it to Bioconductor version soon

— Reply to this email directly, view it on GitHub https://github.com/Vitek-Lab/MSstats/issues/95#issuecomment-1494648586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7FSQV45M3ZJVZKQNTGLUDW7L4XBANCNFSM6AAAAAAVYBIA4I. You are receiving this because you authored the thread.

mstaniak commented 1 year ago

Hi, sorry, I accidentally sent that message to a wrong issue. Could you kindly share a small data sample that reproduces your issue?

liorlobel commented 1 year ago

Thanks for the reply, and sorry for the late response. Please see the attached file. Best,

Lior MSstats_to_github.csv

luizalmeida93 commented 1 year ago

Hi Dr. Staniak,

I have a similar issue with some of my analyses too. It already happened with label-free (MSstats) or TMT data (MSstatsTMT). I am attaching one of them as an example.

This is a label-free study, with proteomics search performed by MaxQuant. Here are the parameters I used for the conversion to MSstats format:

MaxQtoMSstatsFormat(evidence=evidence, annotation=annot, proteinGroups=proteinGroups, removeProtein_with1Peptide=TRUE, proteinID = "Proteins")

The data is here: DataForGitHub.csv

Take a look at the histogram below showing the distribution of pvalues and adj.pvalues for group1 vs group2:

image

Please, let me know if you need any additional data, and thank you for your help.

mstaniak commented 1 year ago

Hi, thanks for providing example data, I will get back to you as soon as possible

mstaniak commented 1 year ago

Hi, @liorlobel which correction method did you use for your comparison of adjusted by MSstats vs p.adjust? Are the cases with a non-zero vs zero value the ones with "oneConditionMissing" message in the "issue" column of groupComparison output?

@luizalmeida93 the correction is fine. It's just that BH produces many repeated p-values. If you look at adj. pvalues for all contrasts together, you will see more variability and some significant results.

liorlobel commented 1 year ago

Hi,

I used the default MSstats option. For the p.adjust I used BH FDR.

Thanks!

On 21 Apr 2023, at 15:10, Mateusz Staniak @.***> wrote:

Hi, @liorlobel https://github.com/liorlobel which correction method did you use for your comparison of adjusted by MSstats vs p.adjust? Are the cases with a non-zero vs zero value the ones with "oneConditionMissing" message in the "issue" column of groupComparison output?

@luizalmeida93 https://github.com/luizalmeida93 the correction is fine. It's just that BH produces many repeated p-values. If you look at adj. pvalues for all contrasts together, you will see more variability and some significant results.

— Reply to this email directly, view it on GitHub https://github.com/Vitek-Lab/MSstats/issues/95#issuecomment-1517726214, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7FSQSN3FPZ5OQH6G52NTTXCJ2MTANCNFSM6AAAAAAVYBIA4I. You are receiving this because you were mentioned.

luizalmeida93 commented 10 months ago

@luizalmeida93 the correction is fine. It's just that BH produces many repeated p-values. If you look at adj. pvalues for all contrasts together, you will see more variability and some significant results.

@mstaniak sorry for reviving this topic, but I am troubleshooting a few things and fell in this rabbit hole again. What did you mean by "look at adj. pvalues for all contrasts together"? I am assuming you ran a different contrast matrix than just two groups, each with a "1" or "-1". Can you provide an example?

mstaniak commented 8 months ago

Sorry for a delayed answer. If I remember correctly - yes. BH correction involves a cumulative minimum, so in some "unlucky" cases it might produces a visible amount of identical adjusted p-values. In your example correction is OK and it looks like a sort of an artifact of a particular data set and comparison. If you look at p-values from all contrasts in a pairwise comparisons matrix, you'll see more variability in p-values.

mstaniak commented 8 months ago

Looking at this again, I think the BioReplicate column in @liorlobel 's data is wrong - if this is group comparison design, BioReplicate labels should be nested in Conditions.

mstaniak commented 3 months ago

closing this issue, please re-open if needed