Closed hexylena closed 7 years ago
So, if you are using tomcat7, you have to use the absolute most recent version, and the default Ubuntu 12 / 14 tomcat7 is typically not recent enough to support websockets. Sorry, should have thought of that sooner. You can drop a jar in, or use the tomcat8 container, which should work.
That being said, if you don't have it installed it will fall back to regular long-polling. The web-services will be fine with or web sockets. You can tell because in the browser you'll see something like this:
Also, have you looked at https://dockstore.org/? .. not sure what the scope of your project is.
If you’re getting 500 errors, it must be failing on the server-side. The first 400 sometimes happens as the interface is refreshing itself.
Look in the catalina.out logs and see if anything is showing up.
Nathan
On Dec 1, 2016, at 5:09 PM, Eric Rasche notifications@github.com wrote:
We're using tomcat 8 The problem doesn't seem to be fallback, it seems to be that websockets fail, and then all of the fallback methods trigger a 500 when SockJS thinks the hostname is invalid. https://cloud.githubusercontent.com/assets/458683/20819308/6b280744-b82b-11e6-9897-1a1b76bebb32.png Dockerstore is something quite different, they're just for packaging up bio-software in containers (of which there are dozens of those types of projects, "bioboxes", "biodckr", mulled https://github.com/mulled/auto-mulled/) Rancher is for orchestration. Think kubernetes / docker swarm + web ui. https://cloud.githubusercontent.com/assets/458683/20819399/eff4f1a8-b82b-11e6-95de-50d5a8fa7286.png https://cloud.githubusercontent.com/assets/458683/20819405/002753d6-b82c-11e6-9478-a3fefd01efc6.png — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GMOD/Apollo/issues/1358#issuecomment-264345007, or mute the thread https://github.com/notifications/unsubscribe-auth/AAt2qnt3Yy3Jf5s5M2vhIyati9mubsOpks5rD2_bgaJpZM4LCECn.
Catalina logs. I'm not seeing the 400 bad request that's specific to the websockets, but I do see the 500s.
12/2/2016 1:12:52 AMhostname can't be null. Stacktrace follows:
12/2/2016 1:12:52 AMorg.springframework.web.socket.sockjs.SockJsException: Uncaught failure in SockJS request, uri=http://localhost:9999/apollo-dev/stomp/294/_pjn08x7/xhr_streaming; nested exception is org.springframework.web.socket.sockjs.SockJsException: Uncaught failure for request http://localhost:9999/apollo-dev/stomp/294/_pjn08x7/xhr_streaming; nested exception is java.lang.IllegalArgumentException: hostname can't be null
12/2/2016 1:12:52 AM at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)
12/2/2016 1:12:52 AM at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
12/2/2016 1:12:52 AM at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
12/2/2016 1:12:52 AM at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
12/2/2016 1:12:52 AM at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)
12/2/2016 1:12:52 AM at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
12/2/2016 1:12:52 AM at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
12/2/2016 1:12:52 AM at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
12/2/2016 1:12:52 AM at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
12/2/2016 1:12:52 AM at java.lang.Thread.run(Thread.java:745)
12/2/2016 1:12:52 AMCaused by: org.springframework.web.socket.sockjs.SockJsException: Uncaught failure for request http://localhost:9999/apollo-dev/stomp/294/_pjn08x7/xhr_streaming; nested exception is java.lang.IllegalArgumentException: hostname can't be null
12/2/2016 1:12:52 AM ... 10 more
12/2/2016 1:12:52 AMCaused by: java.lang.IllegalArgumentException: hostname can't be null
12/2/2016 1:12:52 AM at java.net.InetSocketAddress.checkHost(InetSocketAddress.java:149)
12/2/2016 1:12:52 AM at java.net.InetSocketAddress.<init>(InetSocketAddress.java:216)
12/2/2016 1:12:52 AM ... 10 more
12/2/2016 1:12:52 AM2016-12-02 01:12:52,521 [http-apr-8080-exec-1] ERROR errors.GrailsExceptionResolver - IllegalArgumentException occurred when processing request: [POST] /apollo-dev/stomp/294/4dc58men/xhr
https://github.com/brianfrankcooper/YCSB/issues/105
I don't think its an Apollo issue per se.
Yeah, both of those came up in my searches, but neither yielded useful solutions.
As mentioned in first comment, I can make the request from localhost (i.e. apollo container), but as soon as I get one container away, I cannot and I'm not sure why.
This likely isn't an apollo issue, it's likely a configuration problem somewhere in the stack. E.g. a specific hostname needs to be provided somewhere, some magic flag needs to be set on one of the proxies, etc. Opened issue to track debugging + in case I had documentation if/when I fixed this that other people could make use of.
Hmm, from localhost, it works if it goes to 127.0.0.1. But if the request is made to a different IP address associated with the same machine, it fails. (apollo.apollo resolves to same server as localhost)
root@cea19e6e30ee:/opt# curl 'http://apollo.apollo:8080/apollo-dev/stomp/084/yf3adjwl/xhr' -X POST -H 'Accept: */*' -H ';Cookie: JSESSIONID=2FAF>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2510 0 2510 0 0 237k 0 --:--:-- --:--:-- --:--:-- 245k
HTTP/1.1 500 Internal Server Error
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=UTF-8
Content-Language: en
Transfer-Encoding: chunked
Date: Fri, 02 Dec 2016 17:20:08 GMT
Connection: close
<!DOCTYPE html>
<!--[if lt IE 7 ]> <html lang="en" class="no-js ie6"> <![endif]-->
root@cea19e6e30ee:/opt# curl 'http://localhost:8080/apollo-dev/stomp/084/yf3adjwl/xhr' -X POST -H 'Accept: */*' -H ';Cookie: JSESSIONID=2FAF1C37>
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2 100 2 0 0 385 0 --:--:-- --:--:-- --:--:-- 400
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: no-store, no-cache, must-revalidate, max-age=0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Content-Type: application/javascript;charset=UTF-8
Content-Length: 2
Date: Fri, 02 Dec 2016 17:20:15 GMT
o
Adding
RewriteRule ^/apollo-dev/stomp/(.*)/websocket$ wss://localhost:9999/apollo-dev/stomp/$1/websocket [P,L]
in my apache conf helps. I'm really struggling to understand why it's wss
when nothing outside of our proxy server has access to SSL certs. Surely anything sent over that protocol would fail because there's no access to certs? Ah well, it still fails:
but it's lies
and the XHR fallback still fails.
Also http://stackoverflow.com/questions/28385798/host-name-cant-be-null-using-grails-spring-websocket-plugin seems exactly right but that answer is beyond useless. I've tried setting the hostname to all manner of things and nothing good comes of it.
Update: Spent a long time wiring up JMX and mucking with that. No joy there either.
I just saw this issue, do you still have proxy/websocket problems? I have a working setup that looks like this:
internet ---> apache 2.4 proxy ---> nginx proxy ---> another nginx proxy --> apollo
I can share my config if it can help
@abretaud haven't tested with the latest image which is now based on tomcat:8, so that might be an improvement. I'll do a test deployment here soon.
For us the chain is: internet → apache → haproxy → (docker networking) → apollo
And intuition says it might be haproxy but I don't have any way to test websockets easily. (I really wish people would post demo client / servers for testing these stupid new protocols).
Thanks for the offer, I'll ping you / this issue if I'm still experiencing it. I really, really want apollo deployed on rancher so it isn't separaetly managed since that's painful for me.
Ok, no problem (though I've never used haproxy) +100 for the painfull websocket testing!
@abretaud / @erasche I'm sure you've seen this, but wanted to repost. The default Ubuntu 14 version of tomcat 7 would not have worked (the more recent stable versions do). The long-polling fallback will work fine, though its not ideal (though I don't you're users would notice unless you have weird firewall rules).
Configuring with haproxy for websockets looks like it may have been tricky. Have you tried removing haproxy from the equation?
I guess you saw my comment above about how to confirm that they are working. You just have to watch the "frames" tab.
@nathandunn. We don't use ubuntu14. We (used to) run tomcat:7 (which defaults to jre7), we don't use the ubuntu images as they make for gigantic docker images.
Configuring with haproxy for websockets looks like it may have been tricky. Have you tried removing haproxy from the equation?
Not possible. It's heavily tied to rancher. However haproxy can trivially proxy even mysql connections, so I'm somewhat dubious that it's really to blame and not SockJS + java.
The long-polling fallback will work fine, though its not ideal (though I don't you're users would notice unless you have weird firewall rules).
As you can see from the screenshot in https://github.com/GMOD/Apollo/issues/1358#issuecomment-264345007, that's not quite the case. We were having internal server errors during the fallback. Hence me thinking it was sockjs/java. Maybe the proxy was stripping a header / adding a header that caused issues.
Hopefully the update to tomcat:8-jre8 fixes this, I should know later today.
After the upgrade (still testing my image, has a bunch of other local changes), this is solved! :joy:
Websockets still aren't proxying right, but hey, who cares, fallback works, I can move apollo off my frontend machine and on to compute infrastructure, and I can have my remote user stuff working now.
Thanks for debugging input everyone.
Weird. Did you explicitly setup apache or nginx proxy (we have some doc on this if not) or are you going straight through tomcat? Or you think its still the haproxy?
Anyway, glad it worked one way or another.
It wasn't any explicit changes, just the change to 1) more recent version of apollo (we were on 2.0.3? 4?) + 2) tomcat8/jre8
We hit apache and haproxy: internet → apache → haproxy → (docker SDN) → apollo
.
We actually have nginx in the route as well, but that's just for a special case of API access. There the route goes internet → apache → haproxy → (docker SDN) → nginx to rewrite paths → apollo
, because debugging direct network connections wasn't fun enough ;)
The apache proxy hasn't changed at all, still using pretty standard proxying rules (proxypass, and wstunnel, quite similar to what your docs describe.) That works perfectly with the direct connection, so I'll make some efforts to figure out why that doesn't work over my proxy setup some other. Just happy to have made progress.
I'm hitting this when trying to deploy to rancher. Opening this issue mostly to track my debugging.
My setup looks like:
Currently not sure if websockets are functional at all, not sure how I'd test that. It fails before that. Wonder if I can "get away" with
x-forwarded-for
type headers and that will fix this?curl 'http://localhost:8080/apollo-dev/stomp/084/yf3adjwl/xhr' -X POST -H 'Accept: */*' -H 'Cookie: JSESSIONID=......;' -i | head
curl 'http://10.42.182.231:8080/apollo-dev/stomp/084/yf3adjwl/xhr' -X POST -H 'Accept: */*' -H 'Cookie: JSESSIONID=....;' -i