Closed pjotrp closed 6 years ago
eventlet worker model is running on production. Appears to improve things.
When nginx does not get a response from upstream it 502's. In the current setup we should get less of those. I just did a 6 minute global search that did not bomb out: http://gn2.genenetwork.org/search?species=human&group=GTEx_v5&type=Cervix+mRNA&dataset=GTEXv5_CerEct_0915&search_terms_or=&search_terms_and=*&FormID=searchResult
Please try things like this yourself.
Checking using monster large exon array data set: and using probe set 4684870 (index 1 return). This is a great data set to stress test any correlation method.
[image: Inline image 1]
RUNNING TEST Start time is 7:51 AM: top 500 correlations 7:54 all is still apparently good 8:00 all good. Started independent GN2 queries and they are running file 8:04 process still running: no progress bar and at this point 90% of users will have assumed we crashed a process 8:08 coming up on 18 minutes. This is what the user sees.
[image: Inline image 2] Hmm, can a process like this get a "title" in a browser window, like "GN Correlation in Progress: IDXXXX" ?
nginx/1.4.1
I suspect in this case the calculation would have completed, but just too damn slow on GN2.
NOTE: Running same request on GN1 (EC2 instance): Results in 140.2 seconds. At least 10X faster than GN2.
[image: Inline image 3]
The reason is that GN1 code is optimized to handle massive arrays of data (case-by-expression) using a text file dump of the "ProbeSetFreeze" rather than direct use of MySQL tables. The correlation calculation was also rewritten (as I recall) in C by "David Kroll" (if you want grep).
I don't think GN2 knows about our text file dumps. We did "break" this system when we moved GN1 from Lilly to EC2 about 3 months ago, but Lei then fixed this pretty quickly by moving all of the files into EC2. Code now can find the file and work 10X faster. Probably not hard to implement this in GN2. Just speeding up compute won't help because MySQL or any RDB will be too damn slow to fetch data.
On Sun, Feb 11, 2018 at 5:42 AM, Pjotr Prins notifications@github.com wrote:
When nginx does not get a response from upstream it 502's. In the current setup we should get less of those. I just did a 6 minute global search that did not bomb out: http://gn2.genenetwork.org/search?species=human&group= GTEx_v5&type=Cervix+mRNA&dataset=GTEXv5_CerEct_0915& search_terms_or=&search_terms_and=*&FormID=searchResult
Please try things like this yourself.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/genenetwork/genenetwork2/issues/278#issuecomment-364745209, or mute the thread https://github.com/notifications/unsubscribe-auth/ALva_Gf9qc5rG-2l7aOmGWiIZ96q_mawks5tTtIhgaJpZM4SBSOh .
-- Rob
Robert W. Williams, Ph.D. Chair: Department of Genetics, Genomics and Informatics 71 S Manassas St, Memphis TN 38163 University of Tennessee Health Science Center Office 901 448-7050 CELL 901 604 4752 Office: Translational Science Research Building, Room 407 EMAIL: rwilliams@uthsc.edu Alternative email: labwilliams@gmail.com SKYPE: robwwilliams
On Sun, Feb 11, 2018 at 08:30:53AM -0600, Rob Williams wrote:
Checking using monster large exon array data set: and using probe set 4684870 (index 1 return). This is a great data set to stress test any correlation method.
Absolutely. Interestingly the server has no load and the error log looks like:
ERROR:wqflask.views:.show_trait_page: 13:51:49 UTC 20180211: u'http://gn2.genenetwork.org/show_trait?trait_id=4684870&dataset=UMUTAffyExon_0209
_RMA'
ERROR:wqflask.views:.index_page: 13:51:53 UTC 20180211: u'http://gn2.genenetwork.org/'
ERROR:wqflask.views:.corr_compute_page: 13:51:58 UTC 20180211: u'http://gn2.genenetwork.org/corr_compute'
INFO:utility.tools:Found: file /home/zas1024/genotype_files/genotype/BXD.geno
ERROR:wqflask.views:.submit_trait_form: 13:52:27 UTC 20180211: u'http://gn2.genenetwork.org/submit_trait'
ERROR:wqflask.views:.help: 13:52:55 UTC 20180211: u'http://gn2.genenetwork.org/help'
ERROR:wqflask.views:.index_page: 13:53:49 UTC 20180211: u'http://gn2.genenetwork.org/'
ERROR:wqflask.views:.index_page: 13:55:44 UTC 20180211: u'http://gn2.genenetwork.org/'
ERROR:wqflask.views:.index_page: 13:55:49 UTC 20180211: u'http://gn2.genenetwork.org/'
ERROR:wqflask.views:.index_page: 13:57:49 UTC 20180211: u'http://gn2.genenetwork.org/'
ERROR:wqflask.views:.submit_trait_form: 13:57:52 UTC 20180211: u'http://gn2.genenetwork.org/submit_trait'
ERROR:wqflask.views:.show_temp_trait_page: 13:58:18 UTC 20180211: u'http://gn2.genenetwork.org/show_temp_trait'
ERROR:wqflask.views:.handle_bad_request: 13:58:18 UTC 20180211: could not convert string to float: ZNF77
ERROR:wqflask.views:.handle_bad_request: 13:58:18 UTC 20180211: u'http://gn2.genenetwork.org/show_temp_trait'
ERROR:wqflask.views:.handle_bad_request: 13:58:18 UTC 20180211: Traceback (most recent call last):
File "/usr/local/guix-profiles/gn2-2.11rc2/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/guix-profiles/gn2-2.11rc2/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/production/gene/wqflask/wqflask/views.py", line 416, in show_temp_trait_page
template_vars = show_trait.ShowTrait(request.form)
File "/home/production/gene/wqflask/wqflask/show_trait/show_trait.py", line 151, in __init__
self.make_sample_lists()
File "/home/production/gene/wqflask/wqflask/show_trait/show_trait.py", line 317, in make_sample_lists
header="%s Only" % (self.dataset.group.name))
File "/home/production/gene/wqflask/wqflask/show_trait/SampleList.py", line 44, in __init__
sample = webqtlCaseData.webqtlCaseData(name=sample_name, value=float(self.this_trait[counter-1]))
ValueError: could not convert string to float: ZNF77
So it bombs out but never returns!!
Pj.
Added issue #284. So both above have their own issue now. This one is for the recurring 502's
GN2 gives 502 errors when timing out. This can be reproduced today with long running: http://gn2.genenetwork.org/show_trait?trait_id=1433387_at&dataset=HC_M2_0606_P
and hit correlations with default values. It renders a
after about a minute. I tried replacing sync workers with gevent and eventlet - and it makes no difference. It appears the problem is that we are running external processes which gunicorn can not track.