CreatorDev / DeviceServer

26 stars 20 forks source link

Consul health check breaks webservice-deviceserver? #20

Open mkowalczyk88 opened 7 years ago

mkowalczyk88 commented 7 years ago

Maybe I missed something obvious in configuration but the Device Server doesn't work for me. I use Ubuntu 16.04 and configured everything as described in Readme file of this project (running docker-composer). I've then tried to run tests from https://github.com/CreatorDev/creator-js-client and both have failed.

Here's my investigation: Firstly, the assumption: Fabio can only route to services that are "passing" Consul's health check. Now, accessing Webservice sometimes results with error 404 and sometimes with error 502. If the status of webservice-deviceserver in consul is "not passing" then Fabio reports "no route to host" and I'm getting 404. When the route is found (webserivce-deviceserver passes health check) I'm getting error 502 and Fabio reports http request error "EOF". Note, if the Consul's health check fails then Fabio's route to it is simply removed. Consul checks webservice-deviceserver every 2 seconds by issuing GET request to "/". When sniffing with Wireshark I could observe that some requests are accepted by webservice and then proper JSON with "links" is returned, but sometimes the connection itself is refused (webservice responds with tcp RST packet). In that case, the "EOF" (which is not EOF but simply connection refused error) is seen in logs of Consul and Fabio. When issuing POST request from creator-js-client tests similar problem is observed. If the route in Fabio is present then connection to webservice is closed just after establishing it (FIN packet just after ACK-SYN-ACK packets is sent). If the route is not present at all I'm getting error 404. I did a little hack to verify the behavior. I've modified Registrar env variables in docker-composer.yml that way so it doesn't use request to "/" as a health check but instead a dummy script that always succeeds. When webservice is not spammed with health check requests suddenly the creator-js-client tests are passing. However, the first try of tests still fails (FIN packet sent from webservice just after connection) all other requests are handled properly. I have no idea how the ASP Net works internally, so please have a look/comment. From my observation the conclusion is that the webservice-deviceserver refuses TCP connections some times, especially when it is spammed with Consul health checks.

boyvinall commented 7 years ago

Hi @mkmk88, this sounds a little strange as we (obviously) don't get this behaviour. Couple of questions:

mkowalczyk88 commented 7 years ago

Hi @boyvinall. When I get 502 there is nothing special in docker logs of webservice-deviceserver. Maybe I can enable some additional debug logs there? Regarding CPU and RAM utilization: I didn't check, but I run this on native Ubuntu on a PC with 16GB of RAM and i7. There is nothing else significant run in the same time. Also, I'm pretty sure I did everything as described in https://github.com/CreatorDev/DeviceServer/blob/master/doc/devServerInstallation.md. Note, when I disable health check I'm able to use Device Server, so I guess all certificates etc. are set correctly. One thing i didn't mention, but not sure if relevant: I use self-signed certificates for nginx's SSL. Tomorrow I'll do everything again from scratch on the other machine and let you know if I still see this problem.

mkowalczyk88 commented 7 years ago

Unfortunately, I observe the same behavior on the other machine (this time with Ubuntu 16.04 run as VM). I did everything as in devServerInstalation.md, however this time i couldn't also verify LWM2MServer.pem and LWM2MBootstrap.pem. On my native Ubuntu the "Verify the bootstrap and server certificates" step was OK. Anyway, I didn't run LWM2M stuff yet. The webservice-deviceserver problem I'm describing here seems to be something not related to those certificates.

boyvinall commented 7 years ago

Hi @mkmk88, I realised this morning that the 502 is probably coming from nginx. I was about to spend a little time going through the setup notes once again but I got hit with some other bits, sorry. I'll try to work through the notes in the next day or so and let you know. Otherwise, the only thing I can think of at the moment is maybe the hostname you're using to hit the API is not the same as nginx is configured for?

mkowalczyk88 commented 7 years ago

The hostname is the same.. note, when I disable the health check - whole thing works.