Closed vermaa1 closed 1 year ago
We would suggest looping through the 15K list in blocks of 50 SNP increments. Or even 1 SNP increments, if you desire. But as LDlinkR must ping the LDlink API server before submitting a query to make sure it is working, using the maximum blocks of 50 SNP increments will be fastest. I provided a simple example below with a list of six query SNP's submitted in blocks of 3 SNP's, for a total of two API requests. This could easily be scaled up to 15K SNPs of blocks of 50. Unfortunately, with a list this long, it will still take some time to run. I hope it helps.
query_snps <- c("rs114", "rs496202", "rs345", "rs456", "rs334", "rs3")
query_block_length <- 3 #maximum is 50 query_block_start <- 1 query_block_end <- 3 num_of_blocks <- length(query_snps) / query_block_length
df_final <- data.frame()
for(i in 1:num_of_blocks) {
df_traits <- LDlinkR::LDtrait(snps = query_snps[query_block_start:query_block_end],
pop = c("YRI", "CEU"),
token = "your_token_here"
)
df_final <- rbind(df_final, df_traits)
query_block_start <- query_block_start + query_block_length query_block_end <- query_block_end + query_block_length
}
Hi,
As I understand from the documentation, currently only 50 SNPs can be queried in a single API request. I have a list of ~15K independent SNPs identified from ~1800 GWASs and I would like to use the LDtrait function to identify known vs novel loci. Do you have suggestions on how to query the >50 SNPs programmatically?
Thanks!