P values question - Githubissues

Dear Alan, I am very new to the rhythms analysis, going to try BooteJTK on my data. I have a question about the example file example/TestInput4_OTHERTEXT_boot10-rep2_GammaP.txt. Why the empiric p values (column "empP") are higher then BH-corrected p values (column "GammaBH")? Best regards Alex

Hi Alex,

Thanks for the question, and welcome to the field.

The BH-corrected p-values are lower than the empP because the p-values that come from comparing (via Kendall's Tau) different waveforms against the same signal are correlated.

This means that the Benjamini-Hochberg process, which is only supposed to be used on independent p-values, is inappropriate to use in this case.

See https://www.biorxiv.org/content/10.1101/118547v1.full for more detail on this.

This is a subtle, but very important, point, so let me know if you need me to go into more detail.

Best, Alan

Alan L. Hutchison, MD, PhD PGY-1, Internal Medicine University of Chicago Medicine he/him/his

On Tue, Nov 5, 2019 at 8:44 AM alex297 notifications@github.com wrote:

Dear Alan, I am very new to the rhythms analysis, going to try BooteJTK on my data. I have a question about the example file example/TestInput4_OTHERTEXT_boot10-rep2_GammaP.txt. Why the empiric p values (column "empP") are higher then BH-corrected p values (column "GammaBH")? Best regards Alex

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AA3Y77BBKSGSR26YNWOJRWTQSBTUFA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWV2NNQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3Y77BX7WF6NS2MZGYGRJLQSBTUFANCNFSM4JIXRLBQ .

Dear Alan, Thank you for your reply. Should I use empirical Pvalues (<=0.05 cut off) instead of GammaBH? Best regards Alex

On Thu, Nov 7, 2019 at 4:30 AM alanlhutchison notifications@github.com wrote:

Hi Alex,

Thanks for the question, and welcome to the field.

The BH-corrected p-values are lower than the empP because the p-values that come from comparing (via Kendall's Tau) different waveforms against the same signal are correlated.

This means that the Benjamini-Hochberg process, which is only supposed to be used on independent p-values, is inappropriate to use in this case.

See https://www.biorxiv.org/content/10.1101/118547v1.full for more detail on this.

This is a subtle, but very important, point, so let me know if you need me to go into more detail.

Best, Alan

Alan L. Hutchison, MD, PhD PGY-1, Internal Medicine University of Chicago Medicine he/him/his

On Tue, Nov 5, 2019 at 8:44 AM alex297 notifications@github.com wrote:

Dear Alan, I am very new to the rhythms analysis, going to try BooteJTK on my data. I have a question about the example file example/TestInput4_OTHERTEXT_boot10-rep2_GammaP.txt. Why the empiric p values (column "empP") are higher then BH-corrected p values (column "GammaBH")? Best regards Alex

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AA3Y77BBKSGSR26YNWOJRWTQSBTUFA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWV2NNQ , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA3Y77BX7WF6NS2MZGYGRJLQSBTUFANCNFSM4JIXRLBQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AJUI5QEEQ3ECRXTTQJKCKSDQSOKXHA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDJVIHA#issuecomment-550720540, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUI5QEMP6CIRJAFO2GP4YLQSOKXHANCNFSM4JIXRLBQ .

Hi,

I totally misunderstood your question. Sorry, I thought you were asking a different question. The question I was answering was why methods like RAIN and MetaCycle are inaccurate and shouldn't be used.

The GammaBH is the Benjamini-Hochberg adjustment for multiple hypothesis testing (MHT) of the GammaP values, which are the computed p-values put out by the BooteJTK algorithm. EmpP -values should match GammaP-values, but you should be able to achieve lower values with GammaP. The short answer is you should probably use GammaBH for your analysis, for the reasons below.

MHT is the problem in null-hypothesis standardized testing that occurs when you do multiple tests of the same hypothesis. When there is no signal, the p-values for multiple tests should be roughly evenly distributed from 0 to 1. This means if you do 20 tests, one of those p-values should be <=0.05 (on average). If your p-value threshold is 0.05, this means with data where there is no signal you may find a result. There are many ways to correct for this.

For starters, the p-value (e.g 5%) tells you that "there is a 5% chance that if there were no signal in this data I would get a result this large or larger".

The Bonferroni correction will give you values that for a give threshold alpha (e.g. 5%) tells you "there is s 5% chance that one of the values below 5% is a false positive".

The Bonferroni correction is very conservative and throws out a lot of your data. The Benjamini-Hochberg adjustment will give you values that for a given threshold alpha (e.g. 5%) tells you "of the data with values below 5%, 5% of them are false positives". This is GammaBH.

I would refer you to Hutchison et al 2015 (goes into the most detail), Hutchison et al 2018, and Hutchison et al 2017 to really dig into these issues in detail. They will (and should) come up a lot. Carefully noticing which value is being used will help you determine how much to believe your (and others') results.

https://journals.plos.org/ploscompbiol/article/comments?id=10.1371/journal.pcbi.1004094 https://www.ncbi.nlm.nih.gov/pubmed/30101659 https://www.biorxiv.org/content/10.1101/118547v1.full

Best, Alan

Alan L. Hutchison, MD, PhD PGY-1, Internal Medicine University of Chicago Medicine he/him/his

On Fri, Nov 8, 2019 at 4:08 AM alex297 notifications@github.com wrote:

Dear Alan, Thank you for your reply. Should I use empirical Pvalues (<=0.05 cut off) instead of GammaBH? Best regards Alex

On Thu, Nov 7, 2019 at 4:30 AM alanlhutchison notifications@github.com wrote:

Hi Alex,

Thanks for the question, and welcome to the field.

The BH-corrected p-values are lower than the empP because the p-values that come from comparing (via Kendall's Tau) different waveforms against the same signal are correlated.

This means that the Benjamini-Hochberg process, which is only supposed to be used on independent p-values, is inappropriate to use in this case.

See https://www.biorxiv.org/content/10.1101/118547v1.full for more detail on this.

This is a subtle, but very important, point, so let me know if you need me to go into more detail.

Best, Alan

Alan L. Hutchison, MD, PhD PGY-1, Internal Medicine University of Chicago Medicine he/him/his

On Tue, Nov 5, 2019 at 8:44 AM alex297 notifications@github.com wrote:

Dear Alan, I am very new to the rhythms analysis, going to try BooteJTK on my data. I have a question about the example file example/TestInput4_OTHERTEXT_boot10-rep2_GammaP.txt. Why the empiric p values (column "empP") are higher then BH-corrected p values (column "GammaBH")? Best regards Alex

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AA3Y77BBKSGSR26YNWOJRWTQSBTUFA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWV2NNQ

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AA3Y77BX7WF6NS2MZGYGRJLQSBTUFANCNFSM4JIXRLBQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AJUI5QEEQ3ECRXTTQJKCKSDQSOKXHA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDJVIHA#issuecomment-550720540 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJUI5QEMP6CIRJAFO2GP4YLQSOKXHANCNFSM4JIXRLBQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AA3Y77EFE6XWE3LBHRQ6HNLQSU3APA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDPVEIQ#issuecomment-551506466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3Y77CN4J7YO4EGAYGZWETQSU3APANCNFSM4JIXRLBQ .

Dear Alan, Thank you very much for your detailed explanation Best regards Alex

On Fri, Nov 8, 2019 at 1:25 PM alanlhutchison notifications@github.com wrote:

Hi,

I totally misunderstood your question. Sorry, I thought you were asking a different question. The question I was answering was why methods like RAIN and MetaCycle are inaccurate and shouldn't be used.

The GammaBH is the Benjamini-Hochberg adjustment for multiple hypothesis testing (MHT) of the GammaP values, which are the computed p-values put out by the BooteJTK algorithm. EmpP -values should match GammaP-values, but you should be able to achieve lower values with GammaP. The short answer is you should probably use GammaBH for your analysis, for the reasons below.

MHT is the problem in null-hypothesis standardized testing that occurs when you do multiple tests of the same hypothesis. When there is no signal, the p-values for multiple tests should be roughly evenly distributed from 0 to 1. This means if you do 20 tests, one of those p-values should be <=0.05 (on average). If your p-value threshold is 0.05, this means with data where there is no signal you may find a result. There are many ways to correct for this.

For starters, the p-value (e.g 5%) tells you that "there is a 5% chance that if there were no signal in this data I would get a result this large or larger".

The Bonferroni correction will give you values that for a give threshold alpha (e.g. 5%) tells you "there is s 5% chance that one of the values below 5% is a false positive".

The Bonferroni correction is very conservative and throws out a lot of your data. The Benjamini-Hochberg adjustment will give you values that for a given threshold alpha (e.g. 5%) tells you "of the data with values below 5%, 5% of them are false positives". This is GammaBH.

I would refer you to Hutchison et al 2015 (goes into the most detail), Hutchison et al 2018, and Hutchison et al 2017 to really dig into these issues in detail. They will (and should) come up a lot. Carefully noticing which value is being used will help you determine how much to believe your (and others') results.

https://journals.plos.org/ploscompbiol/article/comments?id=10.1371/journal.pcbi.1004094 https://www.ncbi.nlm.nih.gov/pubmed/30101659 https://www.biorxiv.org/content/10.1101/118547v1.full

Best, Alan

Alan L. Hutchison, MD, PhD PGY-1, Internal Medicine University of Chicago Medicine he/him/his

On Fri, Nov 8, 2019 at 4:08 AM alex297 notifications@github.com wrote:

Dear Alan, Thank you for your reply. Should I use empirical Pvalues (<=0.05 cut off) instead of GammaBH? Best regards Alex

On Thu, Nov 7, 2019 at 4:30 AM alanlhutchison notifications@github.com wrote:

Hi Alex,

Thanks for the question, and welcome to the field.

The BH-corrected p-values are lower than the empP because the p-values that come from comparing (via Kendall's Tau) different waveforms against the same signal are correlated.

This means that the Benjamini-Hochberg process, which is only supposed to be used on independent p-values, is inappropriate to use in this case.

See https://www.biorxiv.org/content/10.1101/118547v1.full for more detail on this.

This is a subtle, but very important, point, so let me know if you need me to go into more detail.

Best, Alan

Alan L. Hutchison, MD, PhD PGY-1, Internal Medicine University of Chicago Medicine he/him/his

On Tue, Nov 5, 2019 at 8:44 AM alex297 notifications@github.com wrote:

Dear Alan, I am very new to the rhythms analysis, going to try BooteJTK on my data. I have a question about the example file example/TestInput4_OTHERTEXT_boot10-rep2_GammaP.txt. Why the empiric p values (column "empP") are higher then BH-corrected p values (column "GammaBH")? Best regards Alex

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AA3Y77BBKSGSR26YNWOJRWTQSBTUFA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HWV2NNQ

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AA3Y77BX7WF6NS2MZGYGRJLQSBTUFANCNFSM4JIXRLBQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <

https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AJUI5QEEQ3ECRXTTQJKCKSDQSOKXHA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDJVIHA#issuecomment-550720540

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJUI5QEMP6CIRJAFO2GP4YLQSOKXHANCNFSM4JIXRLBQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AA3Y77EFE6XWE3LBHRQ6HNLQSU3APA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDPVEIQ#issuecomment-551506466 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA3Y77CN4J7YO4EGAYGZWETQSU3APANCNFSM4JIXRLBQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alanlhutchison/BooteJTK/issues/3?email_source=notifications&email_token=AJUI5QHNESJNEQECWU4MD4DQSVSFTA5CNFSM4JIXRLB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDR2E7Y#issuecomment-551789183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJUI5QHKTKERHWMWGG73UKDQSVSFTANCNFSM4JIXRLBQ .

alanlhutchison / BooteJTK

P values question #3