epinio / ui

Web UI for Epinio
Apache License 2.0
4 stars 3 forks source link

UI Git builds fail #148

Closed abonillabeeche closed 2 years ago

abonillabeeche commented 2 years ago

Describe the bug App Deployment fails when done via the UI, but succeeds from the CLI. Perhaps the web UI is requesting a token that expires while the build is in progress.

To Reproduce

  1. Install Epinio 1.0
  2. Open the UI, select Create an App, use the Git option
  3. https://github.com/abonillabeeche/dizzylizard for the URL and Master branch
  4. Create the App, it will fail on build.

Expected behavior Build should be Deployed - as the build succeeds.

Logs When it succeeds 👍🏼

🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Paketo Procfile Buildpack 5.1.2 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack https://github.com/paketo-buildpacks/procfile 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Process types: 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack web: python -m SimpleHTTPServer $PORT 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack ===> EXPORTING 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding layer 'paketo-buildpacks/ca-certificates:helper' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding 1/1 app layer(s) 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding layer 'launcher' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding layer 'config' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding label 'io.buildpacks.lifecycle.metadata' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding label 'io.buildpacks.build.metadata' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding label 'io.buildpacks.project.metadata' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Saving registry.epinio.svc.cluster.local:5000/apps/workspace-dizzy:b725e78de1f74bf4... 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack *** Images (sha256:ffd4e53bbc5f8b578d4c41e67f9c61d06a936202270325de85d4e5c3482961c9): 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack registry.epinio.svc.cluster.local:5000/apps/workspace-dizzy:b725e78de1f74bf4 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding cache layer 'paketo-buildpacks/node-engine:node' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding cache layer 'paketo-buildpacks/npm-install:build-modules' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding cache layer 'paketo-buildpacks/npm-install:npm-cache' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding cache layer 'paketo-buildpacks/node-module-bom:cyclonedx-node-module' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Done

Deploying application ... .............................. 🕞 Creating application resources ✔️ App is online. Name: dizzy Namespace: workspace Builder Image: Routes: 1: https://dizzy.dev.5glinux.com

but when it fails, via the UI, the log ends at...

🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Adding cache layer 'paketo-buildpacks/node-module-bom:cyclonedx-node-module' 🕞 [stage-workspace-dizzy--1adf17e1992a52d9450aaaed90687ae538a2ddhb] buildpack Done

To follow the server logs you can use:

kubectl logs -n epinio -l app.kubernetes.io/name=epinio-server -c epinio-server -f

2022/07/05 19:23:50 EpinioServer: "msg"="Error" "error"="token expired" 2022/07/05 19:24:19 EpinioServer: "msg"="ERROR" "error"="token expired" "requestId"="aa3695b5-0307-46ad-8c1d-f43a7901e3b6" "origin"="/wapi/v1/namespaces/workspace/staging/4768245a36aa69c4/logs?follow=true&authtoken=eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJlcGluaW8tc2VydmVyIiwiZXhwIjoxNjU3MDQ4NTQ5LCJ1c2VyIjoiYWRtaW4ifQ.VouuOmqfTFz2v0mugjoXB6sJyBRzjc0QgTu_KFl6Ml61hPdSzJh11hZM21pBxKb6ITcXJExNXAy7zHItuMw4lfE38EwRjikSNeI8hUDPMpw-Qe5tcdpEGGBsro_IPKe_1lH2Fy-XH6zMBbmhcprX3jK266Qw1-TY7LwtVe2gQhr2cUxMqbfIzmCnSYojav-VCRXSyhU4vwB-aV8vJT655n6BQKaKIW6jdE_uQ_I62h-0Z2i4MqMFSP1DE1IPtz8J1bOKyuzYbvfDao0hm_wd9Rx9apbxgZHodAcR-H805nR-m_Udo1V1ARKiLtrGsXE49rE6Eeq9722F-3toEidlLg&sockId=29" "error"="token expired" 2022/07/05 19:24:19 EpinioServer: "msg"="Error" "error"="token expired" 2022/07/05 19:24:48 EpinioServer: "msg"="ERROR" "error"="token expired" "requestId"="61e063ef-a9e9-463e-981a-2493c5bd3c9c" "origin"="/wapi/v1/namespaces/workspace/staging/4768245a36aa69c4/logs?follow=true&authtoken=eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJlcGluaW8tc2VydmVyIiwiZXhwIjoxNjU3MDQ4NTQ5LCJ1c2VyIjoiYWRtaW4ifQ.VouuOmqfTFz2v0mugjoXB6sJyBRzjc0QgTu_KFl6Ml61hPdSzJh11hZM21pBxKb6ITcXJExNXAy7zHItuMw4lfE38EwRjikSNeI8hUDPMpw-Qe5tcdpEGGBsro_IPKe_1lH2Fy-XH6zMBbmhcprX3jK266Qw1-TY7LwtVe2gQhr2cUxMqbfIzmCnSYojav-VCRXSyhU4vwB-aV8vJT655n6BQKaKIW6jdE_uQ_I62h-0Z2i4MqMFSP1DE1IPtz8J1bOKyuzYbvfDao0hm_wd9Rx9apbxgZHodAcR-H805nR-m_Udo1V1ARKiLtrGsXE49rE6Eeq9722F-3toEidlLg&sockId=30" "error"="token expired" 2022/07/05 19:24:48 EpinioServer: "msg"="Error" "error"="token expired"

Epinio 1.0.0 - the push fail is only visible via the UI

Create Application The Application will be created ready to deploy source to Success
Fetch Fetch the files at the git repository's branch Success
Build Build the source for the Application Fail

[object Object] Deploy | Deploy and start the Application | Pending

mmartin24 commented 2 years ago

I managed to have a successful deployment using UI (v1.0.0): image

However, I had some others that were not successful. I will try to find out more

mmartin24 commented 2 years ago

Ok, I tried several others application loads (both in Linux using k3d an Mac using Rancher desktop) and all of them were succcesful (at least other 5). I tried to reduce the network speed on my browser to 3G slow and still worked. I was going to ask Alejandro to retry on his side but I re-ran once again and I saw this error on the stage moment in the browser (using still a slow 3g connection):

Failed to load resource: the server responded with a status of 500 ()

Object
errors: Array(1)
0:
details: "stage-id = a1725646f7ab8043"
status: 500
title: "Failed to stage"
[[Prototype]]: Object
length: 1
[[Prototype]]: Array(0)
_headers: {cache-control: 'no-store', content-length: '93', content-type: 'text/plain; charset=utf-8', date: 'Wed, 06 Jul 2022 09:54:59 GMT', pragma: 'no-cache', …}
_req: XMLHttpRequest
onabort: ƒ ()
onerror: ƒ ()
onload: null
onloadend: null
onloadstart: null
onprogress: null
onreadystatechange: ƒ ()
ontimeout: ƒ ()
readyState: 4
response: "{\"errors\":[{\"status\":500,\"title\":\"Failed to stage\",\"details\":\"stage-id = a1725646f7ab8043\"}]}"
responseText: "{\"errors\":[{\"status\":500,\"title\":\"Failed to stage\",\"details\":\"stage-id = a1725646f7ab8043\"}]}"
responseType: ""
responseURL: "https://epinio.172.23.0.2.sslip.io/pp/v1/direct/r/ZThhPrryoUeUuimbk4Gb-G3-I9A/api/v1/namespaces/workspace/staging/a1725646f7ab8043/complete"
responseXML: null
status: 500
statusText: ""
timeout: 0
upload: XMLHttpRequestUpload {onloadstart: null, onprogress: null, onabort: null, onerror: null, onload: null, …}
withCredentials: true
[[Prototype]]: XMLHttpRequest
_status: 500
_statusText: ""
_url: "/pp/v1/direct/r/ZThhPrryoUeUuimbk4Gb-G3-I9A/api/v1/namespaces/workspace/staging/a1725646f7ab8043/complete"
[[Prototype]]: Object

image

Attaching also here the logs from the process: logs.zip

Also adding this log from buildpack within k9s: image

This is the log displayed: Stream closed EOF for epinio/stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8 (buildpack)

jimmykarily commented 2 years ago

This is the error in the logs you attached @mmartin24:

[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8]     Candidate version sources (in priority order):
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8]       package.json -> ">= 0.8.0"
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8]       <unknown>    -> ""
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8] 
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8]     Selected Node Engine version (using package.json): 18.4.0
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8] 
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8]   Executing build process
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8]     Installing Node Engine 18.4.0
[stage-workspace-bug-16-3c71bfe7eda48a753242daf6876e6cd60e6hx6v8] read tcp 10.42.0.29:57578->108.157.109.124:443: read: connection reset by peer

I'm suspecting 108.157.109.124 is a cloudfront IP address for this URL: https://github.com/paketo-buildpacks/node-engine/blob/24058fa580b60e6dd324225a0de07d009795e425/buildpack.toml#L125

Nothing to do with Epinio, probably a network hickup or a nodejs server issue.

jimmykarily commented 2 years ago

The error is coming from here: https://github.com/epinio/epinio/blob/b96fd2ed8cd9f36b16e65a2057e7f0b83ebe9c34/internal/cli/server/server.go#L336 . I suspect that the UI page has stayed open for too long and is not using an expired token. @richard-cox is that possible? I though the UI refreshes the token when it expires, or does it not?

mmartin24 commented 2 years ago

The truth is that regardless of specific issues, Epinio seems to work with this app. I tried 3 times this morning in the UI a fresh environment and all of them worked: image

richard-cox commented 2 years ago

The error is coming from here:

https://github.com/epinio/epinio/blob/b96fd2ed8cd9f36b16e65a2057e7f0b83ebe9c34/internal/cli/server/server.go#L336 . I suspect that the UI page has stayed open for too long and is not using an expired token. @richard-cox is that possible? I though the UI refreshes the token when it expires, or does it not?

The UI will request a token to use in the web socket handshake immediately before opening the socket (the timing between the http request to get the token and the socket opening should be immediate). Think once that's open the token isn't used again. So it's interesting we're hitting an expired issue. Could this be a timezone issue?

The ui -- ui-backend session token may expire and result in the user being logged out, but i don't think that's in play here.

We've definitely hit issues before when the http request that blocks (i think on staging) times out (sometimes seen for big apps on first create). Think there's a github issue to switch to polling

mmartin24 commented 2 years ago

I continue trying yesterday to "stress" a lot epinio to make it fail (pushing big apps earlier, apps that would be faulty, reducing the bandwith, etc) and I was not able. I got more than 10 successes with this app through the UI: image @abonillabeeche could you please retry one more time to see if this issue is still occuring to you? Thanks

andreas-kupries commented 2 years ago

The error is coming from here: https://github.com/epinio/epinio/blob/b96fd2ed8cd9f36b16e65a2057e7f0b83ebe9c34/internal/cli/server/server.go#L336

. I suspect that the UI page has stayed open for too long and is not using an expired token. @richard-cox is that possible? I though the UI refreshes the token when it expires, or does it not?

The UI will request a token to use in the web socket handshake immediately before opening the socket (the timing between the http request to get the token and the socket opening should be immediate). Think once that's open the token isn't used again. So it's interesting we're hitting an expired issue. Could this be a timezone issue?

Looking into the token reported in

2022/07/05 19:24:19 EpinioServer: "msg"="ERROR" "error"="token expired" "requestId"="aa3695b5-0307-46ad-8c1d-f43a7901e3b6" "origin"="/wapi/v1/namespaces/workspace/staging/4768245a36aa69c4/logs?follow=true&authtoken=eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJlcGluaW8tc2VydmVyIiwiZXhwIjoxNjU3MDQ4NTQ5LCJ1c2VyIjoiYWRtaW4ifQ.VouuOmqfTFz2v0mugjoXB6sJyBRzjc0QgTu_KFl6Ml61hPdSzJh11hZM21pBxKb6ITcXJExNXAy7zHItuMw4lfE38EwRjikSNeI8hUDPMpw-Qe5tcdpEGGBsro_IPKe_1lH2Fy-XH6zMBbmhcprX3jK266Qw1-TY7LwtVe2gQhr2cUxMqbfIzmCnSYojav-VCRXSyhU4vwB-aV8vJT655n6BQKaKIW6jdE_uQ_I62h-0Z2i4MqMFSP1DE1IPtz8J1bOKyuzYbvfDao0hm_wd9Rx9apbxgZHodAcR-H805nR-m_Udo1V1ARKiLtrGsXE49rE6Eeq9722F-3toEidlLg&sockId=29" "error"="token expired"

i.e. base64 decoding first (eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9) and second (eyJpc3MiOiJlcGluaW8tc2VydmVyIiwiZXhwIjoxNjU3MDQ4NTQ5LCJ1c2VyIjoiYWRtaW4ifQ) parts I get

{"alg":"RS384","typ":"JWT"}
{"iss":"epinio-server","exp":1657048549,"user":"admin"}

respectively. The exp is likely the expiry. As it looks to be coded as an epoch value this should be resistent to timezone differences. Tcl clock format puts it at

work@tagetarl:~/SUSE/dev/epinio> tclsh
% clock format 1657048549     
Tue Jul 05 21:15:49 CEST 2022

The log entry reports 2022/07/05 19:24:19. Ignoring the hour difference as a TZ diff we still have a nearly 10 minute difference of use after expiry.

The ui -- ui-backend session token may expire and result in the user being logged out, but i don't think that's in play here.

We've definitely hit issues before when the http request that blocks (i think on staging) times out (sometimes seen for big apps on first create). Think there's a github issue to switch to polling

enrichman commented 2 years ago

@andreas-kupries it seems the other way around actually.

The token expired at Tue Jul 05 21:15:49 CEST 2022, that is 10 minutes before the log entry, that happened at 2022/07/05 19:24:19.

andreas-kupries commented 2 years ago

I can see how

10 minute difference of use after expiry.

might have been misread ... It meant it was 10 minutes after expiry when it was used. I.e. we agree.

richard-cox commented 2 years ago

As discussed in planning. This may be an issue where a stale token is being used by the UI when opening a socket connection or an issue where the token supplied by Epinio is stale/incorrectly created.

I'll validate that there's nothing amiss UI side and if applicable transfer back for the Epinio team to look in to

richard-cox commented 2 years ago

From the investigation above, the token used to stream the staging logs over socket has expired (token expires 19:15, error is later 19:24).

Before opening any socket the UI will request a token. So either the socket streaming the staging logs is still open but token expires (going from discussions in planning i don't think this is possible) or the UI opens a socket with a stale token.

I tried to reproduce the latter a number of ways, and found only one where it could happen.

1) In UI bring up the logs for an application 2) In kube hosting epinio kill the epinio api pod a) The UI socket connect is broken b) Kube hosting epinio brings back up epinio api pod c) The UI tries to reconnect using the same URL

At this point, even if the token hasn't expired, I get a token expired error in the api logs

  2022/08/09 10:42:19 EpinioServer: "msg"="ERROR" "error"="token expired" "requestId"="883dc615-e07b-4300-b749-856da52025c0" "origin"="/wapi/v1/namespaces/richard/applications/dizzylizard/logs?follow=true&auth token=eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJlcGluaW8tc2VydmVyIiwiZXhwIjoxNjYwMDQxNjcyLCJ1c2VyIjoiYWRtaW4ifQ.M90Q_XsYuebWwsoK0HOi6Ol1UmX4D1RqXCHvkI0AYaLIazCLukjoVP-olOxYLnXpaMZVVy2VQSThrselOcirZQOX9rKX4bP9sdZQJuSI-kk4lCxv725jV3_elBupuZQxyhv4I2tg5c_8WjYiAUeGlBZ-d4-d0RVS9QveRcu-7JLakEIl_L7nKHGTDHXfSywwqtwMo4L5zDlkFmxUYXcHWxJVPKN3A2C8Z3T2PK2-MmpVNyeNN92f-gHVW6RBZvKEAOk7tiZU92fP6R7yIiH1Votdy_kBFxVgSdIG5xpF30emN9f6V8s4M1n5fNy82qVIMzbzh70Xk5bwzJdArnN4rw&sockId=3" "error"="token expired"

Token Expires - 1660041672 --> Tue Aug 09 2022 10:41:12 GMT+0000 Log Line statement --> 2022/08/09 10:42:19

Going back to the original issue, it could be the case the epinio api pod crashed ... comes back up... the ui tries to reconnect and fails.

To fix this specific scenario the UI shouldn't try to reconnect with the original token... but fetch a new one first.

jimmykarily commented 2 years ago

@richard-cox with the PR merged, we can consider this fixed? Feel free to close this issue if that's the case, thanks.

richard-cox commented 2 years ago

The PR is merged but the issue is in To Test. Manuel is working through that column but is hampered by lack of validation information by me (sorry!).

@mmartin24 This is another that requires a build to properly test (rather than testing against a locally built dashboard), quite simple though

mmartin24 commented 2 years ago

Thanks @richard-cox for the detailed steps. I tested this this issue on a k3d environment on version v1.1.0-16-g080ad496 using the same app originally posted (for some reason @richard-cox I was a Bad Gateway error with yours; not sure if perhaps I am missing a env var )

Anyhow I could reproduce the error killing the epinio-server pod. 1

As can be seen in the screenshot the same log as originally posted appears:

2022/08/22 11:24:55 EpinioServer: "msg"="Error" "error"="token expired"

However once down, the pod auto-recovers and the app is still accessible later: 2

Since the app recovers from the token expired error, it seems to be ok. Closing issue

Edit note: Please observe it was possible to reproduce steadily the token expired error with the app deployed. The original error was that the missing token prevented the app from deploying. This scenario was not reproducible at that stage but the assumption is that the fix extends as well there since no issues have been observed during the tests.