CBIIT / LDlinkR

LDlinkR
56 stars 11 forks source link

Query more than 50 SNPs? #20

Closed vermaa1 closed 1 year ago

vermaa1 commented 2 years ago

Hi,

As I understand from the documentation, currently only 50 SNPs can be queried in a single API request. I have a list of ~15K independent SNPs identified from ~1800 GWASs and I would like to use the LDtrait function to identify known vs novel loci. Do you have suggestions on how to query the >50 SNPs programmatically?

Thanks!

timyers commented 2 years ago

We would suggest looping through the 15K list in blocks of 50 SNP increments. Or even 1 SNP increments, if you desire. But as LDlinkR must ping the LDlink API server before submitting a query to make sure it is working, using the maximum blocks of 50 SNP increments will be fastest. I provided a simple example below with a list of six query SNP's submitted in blocks of 3 SNP's, for a total of two API requests. This could easily be scaled up to 15K SNPs of blocks of 50. Unfortunately, with a list this long, it will still take some time to run. I hope it helps.

__

query list of SNPs in blocks of three at a time

query_snps <- c("rs114", "rs496202", "rs345", "rs456", "rs334", "rs3")

Initialize variables

query_block_length <- 3 #maximum is 50 query_block_start <- 1 query_block_end <- 3 num_of_blocks <- length(query_snps) / query_block_length

initialize empty data frame

df_final <- data.frame()

for(i in 1:num_of_blocks) { df_traits <- LDlinkR::LDtrait(snps = query_snps[query_block_start:query_block_end],
pop = c("YRI", "CEU"), token = "your_token_here" ) df_final <- rbind(df_final, df_traits)

increment block start and end by query_block_length

query_block_start <- query_block_start + query_block_length query_block_end <- query_block_end + query_block_length

}

__