biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
115 stars 20 forks source link

400 Client Error - POST method #91

Closed brendanwee closed 3 years ago

brendanwee commented 3 years ago

Hello,

We have a RNAseq analysis pipeline hosted on AWS where we pump hundreds of RNAseq samples through an alignment + gene counting pipeline. At the end of this pipeline we use MyGene to generate gene symbols for all the genes that have counts. This worked well during testing, but once deployed into production I started a multiple runs, each containing hundreds of samples being run at the same time. This likely totaled to millions of gene_ids being queried through mygene resulting in the following error:

gene_symbol_queries = mg.querymany(stats_df["Geneid"], "ensembl.gene", fields="symbol", returnall=False, as_dataframe=True) File "/usr/local/lib/python3.6/site-packages/biothings_client/base.py", line 542, in _querymany for hits in self._repeated_query(query_fn, qterms, verbose=verbose): File "/usr/local/lib/python3.6/site-packages/biothings_client/base.py", line 223, in _repeated_query from_cache, query_result = query_fn(batch, fn_kwargs) File "/usr/local/lib/python3.6/site-packages/biothings_client/base.py", line 541, in query_fn def query_fn(qterms): return self._querymany_inner(qterms, verbose=verbose, kwargs) File "/usr/local/lib/python3.6/site-packages/biothings_client/base.py", line 488, in _querymany_inner return self._post(_url, params=_kwargs, verbose=verbose) File "/usr/local/lib/python3.6/site-packages/biothings_client/base.py", line 176, in _post res.raise_for_status() File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 400 Client Error: search_phase_execution_exception for url: http://mygene.info/v3/query/

This error is raised for about 30% of all our jobs. (1108 Succeeded, 605 failed). Is the high traffic causing this error? Do you have any advice for a way to get around this issue?

newgene commented 3 years ago

@brendanwee thanks for your reporting. We did see increased errors in recent days and are currently investigating the cause.

namespacestd0 commented 3 years ago

@brendanwee may I ask what's the approximate number of gene_ids being queried per second?

namespacestd0 commented 3 years ago

@brendanwee meanwhile, we have made slight changes to the server cluster to better buffer request bursts, please consider giving it another try. I would recommend gradually increasing the parallel job numbers to avoid any throttling effect.

brendanwee commented 3 years ago

@namespacestd0 I was using the python client MyGeneInfo.querymany() method in each job. There were probably around 100-200 jobs running at once. it seems that querymany batches queries into 1000 at a time. Each job is querying about 52,000 gene ids and finishes querying in about 2 minutes. So roughly 430 id/s * 200 jobs ~= 86,000 queries per second.

Sounds good, I will try and run the jobs again and see how it turns out

brendanwee commented 3 years ago

@namespacestd0 The exact same error occurred and this time more frequently. ~45% of our jobs failed with this error. Did you apply some kind of fix? Can you confirm if this is a throttling written in the code or something else?

namespacestd0 commented 3 years ago

thanks for the update. yeah that level of sustained traffic is surely beyond our server capacity, and beyond the speed of our scaling architecture. I also do see throttling effect on our server side earlier, so I assume you could not complete all jobs. we are definitely open to implementing task queue systems in the future but I recommend slowing down the request rate for now.

brendanwee commented 3 years ago

Ok, thank you for your quick replies. I appreciate you looking into this

namespacestd0 commented 3 years ago

This error is correctly identified as a 5xx error now. https://github.com/biothings/biothings.api/blob/32fad3510023d80700e552c80838c9ac775c3b00/biothings/web/pipeline/execute.py#L69 Additional capacity optimizations will follow.