Apollo 2.6.5 Upgrade Issue

cross12tamu commented 3 years ago

Howdy,

I'm in the midst of trying to uncover various issues with our Galaxy<-->Apollo stack and bridge.

Specifically, we were plagued recently by the findAllOrganisms call (see #2626), which was bricking some of the Galaxy<-->Apollo API tools, causing workflows (and sometimes the individual tool itself) in galaxy to timeout. It takes some of the tools > 5 minutes to load.

We are currently running the following Apollo version to have our system work:

Version: 2.6.2-SNAPSHOT
Grails version: 2.5.5
Groovy version: 2.4.4
JVM version: 1.8.0_265
Servlet Container Version: Apache Tomcat/9.0.16 (Ubuntu)
JBrowse config: 1.16.10-release
JBrowse url: https://github.com/GMOD/jbrowse

note: at least with the timeout increase I made in Apache, we can have the workflows work for now.

However, this issue persists (I have the timeout set to 7 minutes) when using the newest Apollo version, 2.6.5. Additionally, I'm seeing insane server load spikes on the machine that we have Apollo on with the 2.6.5 image.

So, our stack currently works with 2.6.2. However, we still have a different issue, as we have been trying to address the problems in #2607 which we think is fixed in the new version of Apollo, but we are unable to upgrade due to the issues we have listed (see below).

The high load, that I thought was okay when taking this screenshot:

compute1_apollo_load

but after about ~18 hours on the image, we ran out of memory:

8/19/2021 7:58:36 AM2021-08-19 12:58:36,154 [http-nio-8080-exec-48] ERROR errors.GrailsExceptionResolver  - OutOfMemoryError occurred when processing request: [POST] /apollo/organism/findAllOrganisms
8/19/2021 7:58:36 AMJava heap space. Stacktrace follows:
8/19/2021 7:58:36 AMorg.codehaus.groovy.grails.web.servlet.mvc.exceptions.ControllerExecutionException: Executing action [findAllOrganisms] of controller [org.bbop.apollo.OrganismController]  caused exception: Runtime error executing action
8/19/2021 7:58:36 AM    at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:198)
8/19/2021 7:58:36 AM    at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)
8/19/2021 7:58:36 AM    at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
8/19/2021 7:58:36 AM    at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
8/19/2021 7:58:36 AM    at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
8/19/2021 7:58:36 AM    at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
8/19/2021 7:58:36 AM    at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
8/19/2021 7:58:36 AM    at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
8/19/2021 7:58:36 AM    at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
8/19/2021 7:58:36 AM    at com.brandseye.cors.CorsFilter.doFilter(CorsFilter.java:82)
8/19/2021 7:58:36 AM    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
8/19/2021 7:58:36 AM    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
8/19/2021 7:58:36 AM    at java.lang.Thread.run(Thread.java:748)
8/19/2021 7:58:36 AMCaused by: org.codehaus.groovy.grails.web.servlet.mvc.exceptions.ControllerExecutionException: Runtime error executing action
8/19/2021 7:58:36 AM    ... 13 more
8/19/2021 7:58:36 AMCaused by: java.lang.reflect.InvocationTargetException
8/19/2021 7:58:36 AM    ... 13 more
8/19/2021 7:58:36 AMCaused by: java.lang.OutOfMemoryError: Java heap space

Our API remapper error from the findAllOrganisms call:

021/08/17 15:10:44 [error] 17#17: *68845 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.42.0.1, server: , request: "POST /apollo_api/organism/findAllOrganisms HTTP/1.1", upstream: "http://192.168.0.4:8999/apollo/organism/findAllOrganisms", host: "192.168.0.1:9990"

Some various galaxy logs/errors while trying to run the workflow, although some are individual tools too iirc:

galaxy.tools.parameters.basic DEBUG 2021-08-17 09:39:25,504 [p:3081815,w:1,m:0] [uWSGIWorker1Core3] Error determining dy
namic options for parameter 'org_select' in tool 'export':
Traceback (most recent call last):
  File "lib/galaxy/tools/parameters/basic.py", line 820, in get_options
    return eval(self.dynamic_options, self.tool.code_namespace, call_other_values)
  File "<string>", line 1, in <module>
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 675, in galaxy_list_orgs
    data = _galaxy_list_orgs(wa, gx_user, *args, **kwargs)
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 689, in _galaxy_list_orgs
    all_orgs = wa.organisms.findAllOrganisms()
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 581, in findAllOrganisms
    orgs = self.request('findAllOrganisms', {})
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 545, in request
    (r.status_code, r.text))
Exception: Unexpected response from apollo 504: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.19.5</center>
</body>
</html>

galaxy.tools.parameters.basic DEBUG 2021-08-17 09:40:54,238 [p:3081818,w:2,m:0] [uWSGIWorker2Core0] Error determining dy
namic options for parameter 'org_select' in tool 'fetch_jbrowse':
Traceback (most recent call last):
  File "lib/galaxy/tools/parameters/basic.py", line 820, in get_options
    return eval(self.dynamic_options, self.tool.code_namespace, call_other_values)
  File "<string>", line 1, in <module>
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 675, in galaxy_list_orgs
    data = _galaxy_list_orgs(wa, gx_user, *args, **kwargs)
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 689, in _galaxy_list_orgs
    all_orgs = wa.organisms.findAllOrganisms()
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 581, in findAllOrganisms
    orgs = self.request('findAllOrganisms', {})
  File "/galaxy/tools/cpt2/galaxy-tools/tools/webapollo/gga-apollo/webapollo.py", line 545, in request
    (r.status_code, r.text))
Exception: Unexpected response from apollo 504: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.19.5</center>
</body>
</html>

Let me know what y'all think I should do to get this Apollo updated and working. Let me know also if you need anything else from me.

garrettjstevens commented 3 years ago

I'm not sure what would have happened in the latest releases to make #2626 worse. Have you tried any intermediate versions between 2.6.2 and 2.6.5?

cross12tamu commented 3 years ago

I'll try and 2.6.4 and get back to y'all. The best case is I'll give it a whirl this weekend.

GMOD / Apollo

Apollo 2.6.5 Upgrade Issue #2630