chr1swallace / coloc

Repo for the R package coloc
139 stars 44 forks source link

How to construct a dataset for coloc analysis? #115

Closed mocksu closed 1 year ago

mocksu commented 1 year ago

Sorry this is a very basic question: how can I read a local file into a "dataset" to be used by coloc? Suppose I have a tab delimited text file with "b" for beta, "SNP" for snp, "pos" for position. I am not sure what "sdY", "type" mean in "coloc".

I read http://chr1swallace.github.io/coloc/articles/a02_data.html and http://127.0.0.1:19674/library/coloc/doc/a06_SuSiE.html but cannot figure out the above questions.

Thanks a lot for any help!

chr1swallace commented 1 year ago

I think you need a basic introduction for R for the first part - search for a tutorial on "how to read my data into R".

sdY is the standard deviation of your trait, and type is either "cc" or "quant" depending on whether your data is case control or quantitative. See ?check_dataset for this and other parts of a coloc dataset.

mocksu commented 1 year ago

Thanks, Chris. I do know basic R. Guess because you are so familiar with the coloc package so you think the questions I asked were trivial. They are indeed trivial once you know what they are. I figured things out by digging into your R code.

IMHO, it would be very helpful for coloc beginners to know the structure of the data variables used, the return value of each function with each of its parameters.

chr1swallace commented 1 year ago

I wasn't trying to suggest anything about your R skills. But there are many clearly written tutorials for how to read data from a local file, which I understood was your question.

Could you perhaps explain more specifically what is missing from the docs about the input and return values of each function that would help? I have tried to give them in the help() for each function.

https://chr1swallace.github.io


From: mocksu @.> Sent: Thursday, March 9, 2023 10:37:20 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; State change @.> Subject: Re: [chr1swallace/coloc] How to construct a dataset for coloc analysis? (Issue #115)

Thanks, Chris. I do know basic R. Guess because you are so familiar with the coloc package so you think the questions I asked were trivial. They are indeed trivial once you know what they are. I figured things out by digging into your R code.

IMHO, it would be very helpful for coloc beginners to know the structure of the data variables used, the return value of each function with each of its parameters.

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F115%23issuecomment-1462923312&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C86acc571e5ce4affb81908db20eed939%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638139982434168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VNlK3cuYMgThIcRQwMqR5k94agU1UQKm7uyske1fmFI%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2BYIOS3RDJ437VPKZTW3JLSBANCNFSM6AAAAAAVR3L4IQ&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C86acc571e5ce4affb81908db20eed939%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638139982434168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ybzsMjZJV2EtcB1W0GvX2zUdkg%2FvyF4hKK51wlK2Qe4%3D&reserved=0. You are receiving this because you modified the open/close state.Message ID: @.***>

mocksu commented 1 year ago

I wasn't trying to suggest anything about your R skills. But there are many clearly written tutorials for how to read data from a local file, which I understood was your question. Could you perhaps explain more specifically what is missing from the docs about the input and return values of each function that would help? I have tried to give them in the help() for each function. https://chr1swallace.github.io ____ From: mocksu @.> Sent: Thursday, March 9, 2023 10:37:20 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; State change @.> Subject: Re: [chr1swallace/coloc] How to construct a dataset for coloc analysis? (Issue #115) Thanks, Chris. I do know basic R. Guess because you are so familiar with the coloc package so you think the questions I asked were trivial. They are indeed trivial once you know what they are. I figured things out by digging into your R code. IMHO, it would be very helpful for coloc beginners to know the structure of the data variables used, the return value of each function with each of its parameters. — Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F115%23issuecomment-1462923312&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C86acc571e5ce4affb81908db20eed939%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638139982434168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VNlK3cuYMgThIcRQwMqR5k94agU1UQKm7uyske1fmFI%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2BYIOS3RDJ437VPKZTW3JLSBANCNFSM6AAAAAAVR3L4IQ&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C86acc571e5ce4affb81908db20eed939%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638139982434168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ybzsMjZJV2EtcB1W0GvX2zUdkg%2FvyF4hKK51wlK2Qe4%3D&reserved=0. You are receiving this because you modified the open/close state.Message ID: @.***>

Hi Chris,

Coloc is a nice tool. Really appreciate the great work.

For many (maybe most) users, it might be easy to get started. For me, it's not very hard, either. It's just when I went through the tutorial, the datasets used (i.e. coloc_test_data) were provided already, and I didn't see how to construct them myself. First, I expected D1, D2, ... to be dataframes, but they were lists -- actually lists of lists. Lists are not strongly typed, and lists of lists make things even harder to decode. I would suggest an auxiliary function such as read_data_file(filename, beta_field = "beta_field_name_in_the_local_file_maybe_with_default", N_field = "samplesize_field_name_in_the_local_file_maybe_with_default", ...) which returns a data list to be used for coloc.abf.

As for other variables/parameters, please allow me to give an example. For instance, I am still trying to figure out what "LD" is. At first, "linkage disequilibrium" came to me. But, when I searched "LD" on the github site, the given documentation was "@param LD named matrix of r". Well, it does say something about LD, at least it's not linkage disequilibrium -- or maybe it's a matrix of r of linkage disequilibrium? What exactly is this "named matrix of r" and how to prepare/obtain it? What is each row and what is each column of the matrix? IMHO, the more details, the better.

Thanks so much!

chr1swallace commented 1 year ago

thanks, that detail is what I need. Busy next week, but I'll try updating the docs, and would you mind if I run them past you?

-- https://chr1swallace.github.io


From: mocksu @.> Sent: Friday, March 10, 2023 3:03 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; State change @.> Subject: Re: [chr1swallace/coloc] How to construct a dataset for coloc analysis? (Issue #115)

I wasn't trying to suggest anything about your R skills. But there are many clearly written tutorials for how to read data from a local file, which I understood was your question. Could you perhaps explain more specifically what is missing from the docs about the input and return values of each function that would help? I have tried to give them in the help() for each function. https://chr1swallace.github.iohttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchr1swallace.github.io%2F&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225234917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KhI2d2tq%2Bz5hPYx5LYR3w0V8kcqMsxO2mWGBm3ah7hc%3D&reserved=0____ From: mocksu @.> Sent: Thursday, March 9, 2023 10:37:20 PM To: chr1swallace/coloc @.> Cc: Chris Wallace @.>; State change @.> Subject: Re: [chr1swallace/coloc] How to construct a dataset for coloc analysis? (Issue #115https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F115&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225234917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=h8OP%2Bvaw7%2BBkUtZaRytmDFVSxS8um7jIFfkzeQcwTHw%3D&reserved=0) Thanks, Chris. I do know basic R. Guess because you are so familiar with the coloc package so you think the questions I asked were trivial. They are indeed trivial once you know what they are. I figured things out by digging into your R code. IMHO, it would be very helpful for coloc beginners to know the structure of the data variables used, the return value of each function with each of its parameters. — Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F115%23issuecomment-1462923312&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C86acc571e5ce4affb81908db20eed939%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638139982434168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VNlK3cuYMgThIcRQwMqR5k94agU1UQKm7uyske1fmFI%3D&reserved=0https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F115%23issuecomment-1462923312&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225234917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=V2iXvaHensnrti2kkyJdXBDboRyqz%2B5wEXHn%2FT1pSiQ%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2BYIOS3RDJ437VPKZTW3JLSBANCNFSM6AAAAAAVR3L4IQ&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C86acc571e5ce4affb81908db20eed939%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638139982434168602%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ybzsMjZJV2EtcB1W0GvX2zUdkg%2FvyF4hKK51wlK2Qe4%3D&reserved=0https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2BYIOS3RDJ437VPKZTW3JLSBANCNFSM6AAAAAAVR3L4IQ&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225234917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ksCW3xe9HpBoouRkEBdoaDP0hmuneZwsj3v2vVhXr6Y%3D&reserved=0. You are receiving this because you modified the open/close state.Message ID: @.***>

Hi Chris,

Coloc is a nice tool. Really appreciate the great work.

For many (maybe most) users, it might be easy to get started. For me, it's not very hard, either. It's just when I went through the tutorial, the datasets used (i.e. coloc_test_data) were provided already, and I didn't see how to construct them myself. First, I expected D1, D2, ... to be dataframes, but they were lists -- actually lists of lists. Lists are not strongly typed, and lists of lists make things even harder to decode. I would suggest an auxiliary function such as read_data_file(filename, beta_field = "beta_field_name_in_the_local_file_maybe_with_default", N_field = "samplesize_field_name_in_the_local_file_maybe_with_default", ...) which returns a data list to be used for coloc.abf.

As for other variables/parameters, please allow me to give an example. For instance, I am still trying to figure out what "LD" is. At first, "linkage disequilibrium" came to me. But, when I searched "LD" on the github site, the given documentation was @.***https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fparam&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225234917%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YQ8vwBwNpXbPbyrkEAKDhk0UA3lznDdVTgIovQutvGk%3D&reserved=0 LD named matrix of r". Well, it does say something about LD, at least it's not linkage disequilibrium -- or maybe it's a matrix of r of linkage disequilibrium? What exactly is this "named matrix of r" and how to prepare/obtain it? What is each row and what is each column of the matrix? IMHO, the more details, the better.

Thanks so much!

— Reply to this email directly, view it on GitHubhttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fchr1swallace%2Fcoloc%2Fissues%2F115%23issuecomment-1463932735&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225391166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=K8sVd9BsvimKgwCXrz0DjOQL5zQToqUvzU5YGJ7Ya54%3D&reserved=0, or unsubscribehttps://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAQWR2FYTLLZJTMG7POH4L3W3M7EPANCNFSM6AAAAAAVR3L4IQ&data=05%7C01%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C2465c3a1623340444a0808db2178a0e9%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638140574225391166%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=kFcwwwW%2F3ZVYpvFM7Yi3cNoOG40a5SFX03ddqBZyY8c%3D&reserved=0. You are receiving this because you modified the open/close state.Message ID: @.***>

mocksu commented 1 year ago

Sure, it's so kind of you to say so. I would be happy to be of any assistance. Thanks.