ReconfigureIO / platform

The backend of app.reconfigure.io
1 stars 1 forks source link

On-prem batch jobs never become started #305

Open CampGareth opened 5 years ago

CampGareth commented 5 years ago

Hook reco up to an on-prem platform and run a simulation. You end up waiting for the simulation to exit queued and become started forever as the simulation cannot report to platform that it has become started.

I believe this is due to the hardcoded use of HTTPS for event reporting which will fail under local testing as we don't have certificates set up so it's HTTP only: https://github.com/ReconfigureIO/platform/blob/35ffe7fbb60548fd28bcb2c5613c61ba7fc1ba64/handlers/api/build.go#L200-L201

I shall attempt to verify this momentarily with curl.

CampGareth commented 5 years ago
curl -XPOST -H "Content-Type: application/json"  -d '{"status": "STARTED", "message": "STARTED", "code": '0'}' https://local.reconfigure.io/simulations/6772856f-30df-441b-ba4b-ac2f037ba737/events?token=jAkpg3w3HDjtAC5mX8sOmm766tx0doyNuBI1dsv7szJm2l7uHTCezURMnvBSGzVG
curl: (7) Failed to connect to local.reconfigure.io port 443: Connection refused

root@a70dfaf8a2a1:/go/src/github.com/ReconfigureIO/platform# curl -XPOST -H "Content-Type: application/json"  -d '{"status": "STARTED", "message": "STARTED", "code": '0'}' https://local.reconfigure.io:80/simulations/6772856f-30df-441b-ba4b-ac2f037ba737/events?token=jAkpg3w3HDjtAC5mX8sOmm766tx0doyNuBI1dsv7szJm2l7uHTCezURMnvBSGzVG
curl: (35) error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol

root@a70dfaf8a2a1:/go/src/github.com/ReconfigureIO/platform# curl -XPOST -H "Content-Type: application/json"  -d '{"status": "STARTED", "message": "STARTED", "code": '0'}' http://local.reconfigure.io:80/simulations/6772856f-30df-441b-ba4b-ac2f037ba737/events?token=jAkpg3w3HDjtAC5mX8sOmm766tx0doyNuBI1dsv7szJm2l7uHTCezURMnvBSGzVG
{"value":{"timestamp":"2018-11-14T12:24:15.979095993Z","status":"STARTED","message":"STARTED","code":0}}

So that's the bug verified. There are a couple of angles here. One is that our API doesn't speak HTTPS, it currently relies on an AWS load balancer in front of it to strip the SSL. This is something we might want to fix in future for on-prem as the alternative is finding a stand in that can strip SSL. The other angle is that maybe we shouldn't hard-code HTTPS in our URLs. I attempted to fix that problem here but the PR needs cleaning up: https://github.com/ReconfigureIO/platform/pull/270