This is an R client for performing queries with AnnoQ API.
Install from github. Make sure you have installed devtools
.
install.packages("devtools")
Then
library(devtools)
install_github("USCbiostats/AnnoQR")
Query Variants with ANNOVAR_ensembl_Effect
Annotation
library(AnnoQR)
q = init_query_js_body()
ex = exists_filter("ANNOVAR_ensembl_Effect")
q = add_query_filter(q, ex)
variants = perform_search(q)
variants
Only retrieve ANNOVAR_ensembl_Effect
column
q = add_source(q, c("ANNOVAR_ensembl_Effect"))
variants = perform_search(q)
variants
Query variants field SnpEff_ensembl_Effect
marked as intergenic_region
q = init_query_js_body()
term_f = term_filter('SnpEff_ensembl_Effect' , 'intergenic_region')
q = add_query_filter(q, term_f)
variants = perform_search(q)
variants
Query variants field SnpEff_ensembl_Effect
marked as `intergenic_region with in chromosome 20
q = init_query_js_body()
term_f1 = term_filter('SnpEff_ensembl_Effect' , 'intergenic_region')
term_f2 = term_filter('chr' , '20')
q = add_query_filter(q, term_f1)
q = add_query_filter(q, term_f2)
#q = add_source(q, c('SnpEff_ensembl_Effect'))
variants = perform_search(q)
variants
Query variants with 1000 genome allel count 1000Gp3_AC
larger than 5
q = init_query_js_body()
range_f = range_filter(key='1000Gp3_AC' , gt=5)
q = add_query_filter(q, range_f)
variants = perform_search(q)
variants
Chromosome range query
variants = regionQuery(contig = '20', start=31710367, end=31820367)
variants
rsID query
variant = rsidQuery('rs193031179')
variant
keywordsQuery
keywordsQuery('protein_coding')
Our API leverages the powerful features of Elasticsearch, but it's important to be aware of certain behaviors related to query results:
perform_search(q)
perform_search(q)
by default yields only the first 10 matches. This is a standard constraint imposed by Elasticsearch on query results.For example, the following R code snippet demonstrates how to use perform_search(q)
effectively:
q = init_query_js_body()
ex = exists_filter("ANNOVAR_ensembl_Effect")
q = add_query_filter(q, ex)
variants = perform_search(q)
hits = variants$hits$hits
length(hits$`_index`)
Running this snippet typically results in:
10
This indicates the successful retrieval of the first 10 matches, aligning with Elasticsearch's default result limit.
perform_search_with_count(q)
perform_search_with_count(q)
. This is particularly useful for exhaustive queries where the entire dataset is necessary.Consider this code snippet:
q = init_query_js_body()
ex = exists_filter("ANNOVAR_ensembl_Effect")
q = add_query_filter(q, ex)
variants = perform_search_with_count(q)
This may result in an error if the result set is too large:
Error in perform_search_with_count(q) : Bad Request (HTTP 400)
The error is attributed to the attempt of perform_search_with_count(q)
to retrieve all matches, surpassing Elasticsearch's maximum limit.
perform_search_find_count(q)
To ascertain the size of your query's result set, you can use:
q = init_query_js_body()
ex = exists_filter("ANNOVAR_ensembl_Effect")
q = add_query_filter(q, ex)
variants = perform_search_find_count(q)
This will generate:
Debug: Query JSON:
{"query":{"bool":{"filter":[{"exists":{"field":"ANNOVAR_ensembl_Effect"}}]}},"size":40405505}
Response [http://annoq.org/api/annoq-test/_search]
Date:
Status: 400
Content-Type: application/json;charset=utf-8
Size: 1.48 kB
Here, the "size" parameter exceeds 40 million, explaining the error. Such a large result set is beyond the permissible range for a single Elasticsearch query.
perform_search_find_count(q)
to determine the total count.