Closed ozgunbabur closed 2 years ago
Ozgun,
I may have missed the original emails, can you please send a link to some info about enrichment tables? I'm not sure how to read them.
Thanks!
On 2022-02-25 11:27, Özgün Babur wrote:
EXTERNAL SENDER
Write a script that will generate a p-value for enrichment (or for the opposite: deficiency), for a given set of values in a 2x2 table, using the hypergeometric test.
Step 1: Find a library that will run the hypergeometric test on a given 2x2 table to generate a p-value.
Step 2: Implement a script that will use that library to calculate a p-value for the enrichment in a given 2x2 table.
Here is an example. Let's say we are given the below 2x2 table
- +
- 15 10
- 3 5
And we want to check if the number 5 indicates an enrichment (i.e. 5 is too high assuming an independent distribution of these two features). Then the enrichment test will apply the hypergeometric test on this distribution and more imbalanced distributions. These are
+
- 15 10
- 3 5
+
- 16 9
- 2 6
+
- 17 8
- 1 7
+
- 18 7
- 0 8
The sum of p-values from hypergeometric tests of these distributions will give us the one-tailed p-value for enrichment. Here we found the probability of the +/+ case being 5 or more by random.
To test for deficiency instead (to see if 5 is a significantly low number), we need to find the probability of the +/+ case being 5 or fewer by random. In that case, the distributions we need to include are
+
- 15 10
- 3 5
+
- 14 11
- 4 4
+
- 13 12
- 5 3
+
- 12 13
- 6 2
+
- 11 14
- 7 1
+
- 10 15
- 8 0
— Reply to this email directly, view it on GitHub https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPathwayAndDataAnalysis%2FFinkle-PHYS-479%2Fissues%2F2&data=04%7C01%7Cnurit.haspel%40umb.edu%7C72e5185ebbab438aefdf08d9f87bb9f2%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637814032564263181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bCJWsYFUUNZ0XXe6GYtjdZx79cSRxFYbzWp2VoU9%2BxY%3D&reserved=0, or unsubscribe https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA2A7T4PG62LYHMAUCCZGVDU46UXJANCNFSM5PKWUE5A&data=04%7C01%7Cnurit.haspel%40umb.edu%7C72e5185ebbab438aefdf08d9f87bb9f2%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637814032564263181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zhPOW3uRlXYlliHqdvjroiz4UIY3Rc5Mn6SyGwXgdp8%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOS https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cnurit.haspel%40umb.edu%7C72e5185ebbab438aefdf08d9f87bb9f2%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637814032564263181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=geTZeXijRM2hvXoFOJh4O42R8y%2FFARXAfhexdS5PDrY%3D&reserved=0 or Android https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cnurit.haspel%40umb.edu%7C72e5185ebbab438aefdf08d9f87bb9f2%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637814032564263181%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=1reaE%2BHkZ%2Fcie15MD2GwD4%2FuzmKaS0ABK4WaeDtXMmI%3D&reserved=0.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Here is the link to the google docs describing those tables: https://docs.google.com/presentation/d/1ycjs3BdIcXiQz7uCx9KUWYRa9cgc0Ur-CP0R94qRy30/edit?usp=sharing
And please write tests to make sure your code works as expected. Here are some tables for testing.
Assume the letters a, b, c, and d represent the counts as displayed below.
- | + | |
---|---|---|
- | a | b |
+ | c | d |
Then if a = 4, b = 6, c = 6, d = 4, The enrichment p-value should be 0.9105522960012121 The deficiency p-value should be 0.3281408993483296
If a = 20, b = 30, c = 25, d = 30, The enrichment p-value should be 0.7766662300662146 The deficiency p-value should be 0.357198690677301
If a = 15, b = 8, c = 20, d = 42 The enrichment p-value should be 0.006445865568610187 The deficiency p-value should be 0.9985899806396821
If a = 21, b = 20, c = 34, d = 13 The enrichment p-value should be 0.9883420938210076 The deficiency p-value should be 0.0341612031176084
I've implemented the test but have not yet discovered how to fix the p_value calculation method...
Fixed it! It was a small error in the code. Now, it passes 3 of 4 tests!
Almost there! There are a few problems.
I've applied your changes, and now the test passed only 3/4 of the enrichment tests and none of the deficiency ones, 3/4 of which it passed before.
There is a bug in the deficiency pval implementation. Please look at line 19.
I feel silly. The code seemed fine to me, so I sought errors in what I was trying to do, but then just ran the code and saw the return statement lacked a closing parenthesis.
From: Özgün Babur @.> Sent: Tuesday, March 29, 2022 7:55 PM To: PathwayAndDataAnalysis/Finkle-PHYS-479 @.> Cc: Adam E Finkle @.>; Assign @.> Subject: Re: [PathwayAndDataAnalysis/Finkle-PHYS-479] Hypergeometric test for enrichment test (Issue #2)
EXTERNAL SENDER
There is a bug in the deficiency pval implementation. Please look at line 19.
— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPathwayAndDataAnalysis%2FFinkle-PHYS-479%2Fissues%2F2%23issuecomment-1082478064&data=04%7C01%7CAdam.Finkle001%40umb.edu%7C24a38e2b2c6c44d6ded908da11dfacf1%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637841949628659252%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=vaXyjQ3Qz%2B3oN19BvANuDA8IHArhWkH3YkP8R3uvies%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAX2HYGDWN3WGBYCS6UN2PZLVCOKA3ANCNFSM5PKWUE5A&data=04%7C01%7CAdam.Finkle001%40umb.edu%7C24a38e2b2c6c44d6ded908da11dfacf1%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637841949628659252%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=L9MHRQ3rlL4Xb0EmoEQcKuQddR0hVxb18LThlQhdsjY%3D&reserved=0. You are receiving this because you were assigned.Message ID: @.***>
Write a script that will generate a p-value for enrichment (or for the opposite: deficiency), for a given set of values in a 2x2 table, using the hypergeometric test.
Step 1: Find a library that will run the hypergeometric test on a given 2x2 table to generate a p-value.
Step 2: Implement a script that will use that library to calculate a p-value for the enrichment in a given 2x2 table.
Here is an example. Let's say we are given the below 2x2 table
And we want to check if the number 5 indicates an enrichment (i.e. 5 is too high assuming an independent distribution of these two features). Then the enrichment test will apply the hypergeometric test on this distribution and more imbalanced distributions. These are
The sum of p-values from hypergeometric tests of these distributions will give us the one-tailed p-value for enrichment. Here we found the probability of the +/+ case being 5 or more by random.
To test for deficiency instead (to see if 5 is a significantly low number), we need to find the probability of the +/+ case being 5 or fewer by random. In that case, the distributions we need to include are