eclipse-archived / codewind

The official repository of the Eclipse Codewind project
https://codewind.dev
Eclipse Public License 2.0
114 stars 44 forks source link

SVT: [Remote][Intermittent] Performance Dashboard UI can present the load test stuck in "Requested" #2284

Closed sujeilyfonseca closed 4 years ago

sujeilyfonseca commented 4 years ago

Codewind version: 0.9.0 OS: VM with Windows 10 (w. OKD/Openshift cluster)

IDE extension version: 0.9.0 IDE version: 1.42.0

Description: @jagraj and I noticed that the Performance Dashboard UI can present the load test stuck in "Requested". If users try to cancel the request, they will not be able to do so:

Screen Shot 2020-02-20 at 4 58 04 PM

However, the load test runs successfully. Users can see the test results, if they open a new window:

Screen Shot 2020-02-20 at 5 11 37 PM

Also, if they attempt to run new load tests in the new window, they will not see the problem again:

Screen Shot 2020-02-20 at 5 32 51 PM
[root@sfonseca-okd-codewind ~]# oc  logs  codewind-performance-k6v9sv51-84f57c6cd7-hkgmw 

> performanceserver@1.0.0 start /usr/src/app
> PORT=9095 node server.js

[20/02/20 21:42:36 /usr/src/app/server.js] [INFO] Performance server listening on port 9095!
[20/02/20 21:58:56 /usr/src/app/server.js] [INFO] starting loadrun with options : {"path":"/","requestsPerSecond":"1","concurrency":"1","maxSeconds":"180","url":"http://172.30.209.241:9080/"}
[20/02/20 22:01:57 /usr/src/app/server.js] [INFO] completed - loadrun summary : { totalRequests: 180,
  totalErrors: 0,
  totalTimeSeconds: 180.005692361,
  rps: 1,
  meanLatencyMs: 6.1,
  maxLatencyMs: 67,
  minLatencyMs: 4,
  percentiles: { '50': 5, '90': 6, '95': 7, '99': 11 },
  errorCodes: {},
  instanceIndex: 0 }

[20/02/20 22:12:49 /usr/src/app/server.js] [INFO] starting loadrun with options : {"path":"/","requestsPerSecond":"1","concurrency":"1","maxSeconds":"180","url":"http://172.30.209.241:9080/"}
[20/02/20 22:15:49 /usr/src/app/server.js] [INFO] completed - loadrun summary : { totalRequests: 180,
  totalErrors: 0,
  totalTimeSeconds: 180.005418932,
  rps: 1,
  meanLatencyMs: 6,
  maxLatencyMs: 30,
  minLatencyMs: 4,
  percentiles: { '50': 5, '90': 6, '95': 7, '99': 9 },
  errorCodes: {},
  instanceIndex: 0 }

[20/02/20 22:19:01 /usr/src/app/server.js] [INFO] starting loadrun with options : {"path":"/","requestsPerSecond":"1","concurrency":"1","maxSeconds":"180","url":"http://172.30.209.241:9080/"}
[20/02/20 22:22:01 /usr/src/app/server.js] [INFO] completed - loadrun summary : { totalRequests: 180,
  totalErrors: 0,
  totalTimeSeconds: 180.00626806100001,
  rps: 1,
  meanLatencyMs: 5.7,
  maxLatencyMs: 29,
  minLatencyMs: 4,
  percentiles: { '50': 5, '90': 6, '95': 7, '99': 10 },
  errorCodes: {},
  instanceIndex: 0 }

Workaround:

  1. If, after executing a load test, users see the status stuck in "Requested", they should open a new Performance Dashboard window.

@jagraj

jagraj commented 4 years ago

I also got into similar issue with the hybrid scenario. The load test status stuck at "Requested" state and I need to close the window and open new window to submit another request or wait for first request to complete. Here are my screenshots. We do not see same issue in Eclipse Che and this issue can be addressed in next release as we have workaround.

Screen Shot 2020-02-20 at 5 47 27 PM Screen Shot 2020-02-20 at 5 49 28 PM Screen Shot 2020-02-20 at 5 51 39 PM

With the new window. Screen Shot 2020-02-20 at 5 52 19 PM

markcor11 commented 4 years ago

The cancel and run methods are being overhauled at the moment. The changes that went in recently for 0.9.0 switch the status to "Requesting" earlier in the cycle. This is to protect the user from accidentally clicking the button to start a second test before the first one completes.

The status change from Requested to Preparing are controlled via a socket message from PFE back to the dashboard. That same socket pipe is also used to control the flow from Cancelling to Cancelled. However, if the browser lost socket connection to PFE and didn't manage to reconnect it, those socket events will never be heard. The dashboard action to start load and cancel are performed over REST, they will fire (which is why the load run did start and did complete), but in your case the UI didn't get back the socket status change to refresh the button label because the socket connection had dropped some time earlier.

We are going to harden this flow a few ways :

  1. Ensure that the UI knows about socket connected and disconnected state and display that on the dashboard with a "connected" and "disconnected" icon.
  2. Block all interaction with the UI when socket status is disconnected forcing a page reload and protect against any REST/Socket sync issues
  3. Historically "isLoadRunning" is a boolean and it's just not refined enough to hold state about the many recent additions in Codewind like app profiling, load flow control, concurrency etc. We'll replace this boolean with a phase controller object that manages the entire workflow of all parts of running a load experiment.
malincoln commented 4 years ago

@tobespc are you targeting 0.10.0 for fix?

tobespc commented 4 years ago

this has been fixed already, can not reproduce with the fixes in

malincoln commented 4 years ago

@sujeilyfonseca pls verify and close. Thanks

sujeilyfonseca commented 4 years ago

I've verified this issue with the latest Codewind 0.10.0, and I no longer reproduce the problem.

sujeilyfonseca commented 4 years ago

I reopened this problem as we are seeing this problem again with Codewind 0.10.0:

Screen Shot 2020-03-18 at 6 24 35 PM Screen Shot 2020-03-18 at 6 31 12 PM
jagraj commented 4 years ago

I also reproduced same problem with the remote/hybrid scenario. After clicking "Run Load Test" , the status get stuck into "Requested". If we wait for few minutes and refresh the page then we can see the results.

image (3)

After refresh..

image

jagraj commented 4 years ago

I just tried to reproduce this problem again for failed project and now the request is not hung.Looks like this is intermittent problem. Here are the logs from my environment which should have for both failed and successful load test.

performance.txt pfe.txt

malincoln commented 4 years ago

@jagraj is this considered stopship? I don't see the label. Also, if it's intermittent, maybe we keep it hot and add "intermittent" in title?

sujeilyfonseca commented 4 years ago

/priority hot

malincoln commented 4 years ago

Moving from verify to In progress and assigning to @tobespc

malincoln commented 4 years ago

Decided during discussion to continue to investigate this for 0.10.0 and see if we have a fix by mid next week and decide whether we ship with/without fix.

sujeilyfonseca commented 4 years ago

As a general note about this issue, it can be reproduced more often with new Codewind installations. If Codewind has been running for a while, the occurrence of this issue may be reduced.

Regarding environments, I have seen this issue more often in remote scenarios. However, it may be reproduced in Eclipse Che.

Once you hit this issue, there are three possible scenarios:

  1. Cancel will not work, and you will see the UI with a "Cancelling" status. If you refresh the page or open a new window, you will not be able to run load tests with the same project:
Screen Shot 2020-03-20 at 10 44 16 AM Screen Shot 2020-03-20 at 10 44 32 AM Screen Shot 2020-03-20 at 10 48 33 AM
  1. After refreshing the page and waiting some minutes, you can get test results:
Screen Shot 2020-03-20 at 11 37 57 AM
  1. After refreshing the page and waiting some minutes, you will not see anything and will see the same if you try to run another test:
Screen Shot 2020-03-20 at 11 36 25 AM

Screen Shot 2020-03-20 at 11 32 48 AM

Here are some additional logs to help solve this problem:

remote-codewind-pfe.txt remote-codewind-performance.txt

che-codewind-pfe.txt che-codewind-performance.txt

Here are some web console logs: Firefox (Remote):

Object { url: "https://codewind-keycloak-k808xk51.9.37.222.128.nip.io/auth", realm: "cw", clientId: "codewind-k808xk51", onLoad: "login-required" }
bundle.js:79:369450
[KEYCLOAK] Using legacy promises is deprecated and will be removed in future versions. You can opt in to using native promises by setting `promiseType` to 'native' when initializing Keycloak. bundle.js:36:245787
Firefox can’t establish a connection to the server at wss://codewind-gatekeeper-k808xk51.9.37.222.128.nip.io/socket.io/?EIO=3&transport=websocket&sid=pwJcFo3ikbdFWJV4AAAQ. bundle.js:63:9673
Dashboard - authenticated bundle.js:79:369685
Dashboard refresh-token refreshed bundle.js:79:370076

Google Chrome (Che):

Failed to load resource: the server responded with a status of 404 (Not Found)
bundle.js:79 Object
bundle.js:79 No Auth service available
(anonymous) @ bundle.js:79
Promise.then (async)
n @ bundle.js:6
c @ bundle.js:6
(anonymous) @ bundle.js:6
(anonymous) @ bundle.js:6
Ht @ bundle.js:79
(anonymous) @ bundle.js:79
(anonymous) @ bundle.js:79
a @ main.js:1
t @ main.js:1
(anonymous) @ main.js:1
(anonymous) @ main.js:1
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=E2XDXhQ4-sCQvaNYAAAB' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
/favicon.ico:1 Failed to load resource: the server responded with a status of 404 (Not Found)
main.js:1 Result card - No available CPU data
main.js:1 Result card - No available memory data
main.js:1 Result card - No available HTTP response data
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=TjBUg6kOA5EFF26oAAAE' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=jGtYNzOk-GdLeBMKAAAM' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
/socket.io/?EIO=3&transport=polling&t=N3ub1Ii&sid=jGtYNzOk-GdLeBMKAAAM:1 Failed to load resource: the server responded with a status of 400 (Bad Request)
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=VAOfVzxpCnPL-eEaAAAO' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
/socket.io/?EIO=3&transport=polling&t=N3ucfLB&sid=VAOfVzxpCnPL-eEaAAAO:1 Failed to load resource: the server responded with a status of 400 (Bad Request)
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=z_KP0_w3O1LLzSP8AAAS' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=N-0Ee2zzfj_yNDxsAAAW' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=OqBUzmeDk8WunNDoAAAZ' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
/socket.io/?EIO=3&transport=polling&t=N3ugdeT&sid=OqBUzmeDk8WunNDoAAAZ:1 Failed to load resource: the server responded with a status of 400 (Bad Request)
bundle.js:63 WebSocket connection to 'wss://codewind-workspacesfd52uiby8tbsts4-che-che.apps.cw-codewind-43.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=5DjPTrg7K1qr62DYAAAb' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
jagraj commented 4 years ago

After installing remote codewind, I created Appsody Node.js Scaffold, Appsody Eclipse Microprofile, Default Node.js express, WebSphere liberty projects. When the projects start running, I tried to access "Performance dashboard" and started "Run Load Test" for "Default Node.js Express" project. The status goes into "Requested" state and when we try to "Cancel" it then it get stuck at "Canceling" state.

Here are the logs and screenshots from the failure.

Screen Shot 2020-03-20 at 10 50 02 AM

Logs gate-keeper.txt performance.txt pfe.txt

Browser console errors.

bundle.js:36 [KEYCLOAK] Using legacy promises is deprecated and will be removed in future versions. You can opt in to using native promises by setting `promiseType` to 'native' when initializing Keycloak.
a.init @ bundle.js:36
bundle.js:63 WebSocket connection to 'wss://codewind-gatekeeper-k809w0yq.apps.extols.os.fyre.ibm.com/socket.io/?EIO=3&transport=websocket&sid=-i9urRqOuVlWvM_jAAAE' failed: Error during WebSocket handshake: Unexpected response code: 400
d.doOpen @ bundle.js:63
bundle.js:79 Dashboard - authenticated
main.js:1 Result card - No available CPU data
main.js:1 Result card - No available memory data
main.js:1 Result card - No available HTTP response data
bundle.js:79 Dashboard refresh-token refreshed
(anonymous) @ bundle.js:79
DevTools failed to parse SourceMap: chrome-extension://mbopgmdnpcbohhpnfglgohlbhfongabi/sideex/browser-polyfill.js.map
micgibso commented 4 years ago

New Troubleshooting issue drafted on above PR.

jagraj commented 4 years ago

@tobespc @micgibso Based on the comments provided in this issue (https://github.com/eclipse/codewind/issues/2284#issuecomment-601770507) the workaround solution will not help to the users always. Our understanding on Friday checkpoint call that we are planning to deliver fix for this issue in Codewind 0.10.0 release.

tobespc commented 4 years ago

As I understand it, refreshing the browser does solve the issue. Its the ui socket disconnecting that causes the issue and if we can detect that, we can output a message to the user informing them.

To change this in code is a large change and not something I believe we should do under a stop ship. If we have a suitable workaround then I would propose we use that and fix this in a different way for 0.11.0

But, do we agree on the workaround ? Its hard to tell from this thread what the current status actually is so be worth discussing it in a call

tobespc commented 4 years ago

whatever the solution is though, we should never fail silently

micgibso commented 4 years ago

Docs no longer required, programmatically rectified, docs PR closed, not merged.

malincoln commented 4 years ago

Decided during discussion to continue to investigate this for 0.10.0 and see if we have a fix by mid next week and decide whether we ship with/without fix.

Here is the update after the call on Friday. There were more updates from SVT after my update

sujeilyfonseca commented 4 years ago

I cleaned up my environments and executed tests immediately after getting the latest builds. I covered scenarios with Eclipse Che and VS Code (Remote). Here are my findings:

1. Eclipse Che (Fyre OCP 4.3 Cluster): I deployed a new Codewind 0.10.0 workspace and proceeded to create four Default (Codewind-style) projects. Then, I continued to wait until at least one of them was in "Running" state:

[root@cw-codewind-43-inf ~]# oc get pods
NAME                                                              READY   STATUS      RESTARTS   AGE
che-65ddf6b565-thwwh                                              1/1     Running     0          6d19h
che-operator-5fc5d6d8d7-98xhd                                     1/1     Running     0          6d19h
codewind-performance-workspace7wd59uozf2kus44z-5846fccdf7-22khk   1/1     Running     0          78m
codewind-workspace7wd59uozf2kus44z-b588fd594-vskjr                1/1     Running     0          78m
cw-cwchemicroprofile010-ef93d810-6d26-11ea-9773-648b4dffdfjp557   1/1     Running     0          65m
cw-cwchenode010-089201c0-6d27-11ea-9773-c987d8b87-4z7sx           1/1     Running     0          61m
cw-cwchespring010-03ed31d0-6d27-11ea-9773-6cb9f96688-m9gnw        1/1     Running     0          66m
cw-cwcheswift010-f89ac9a0-6d26-11ea-9773-75f48cf8b5-5gxz7         1/1     Running     0          65m
devfile-registry-7d9d8b8dd4-smms8                                 1/1     Running     0          6d19h
keycloak-66f6d6c444-nbv6f                                         1/1     Running     0          6d19h
plugin-registry-6d94b964db-zbcwx                                  1/1     Running     0          6d19h
postgres-57c4df88dd-rhs7p                                         1/1     Running     0          6d19h
workspace7wd59uozf2kus44z.che-workspace-pod-7fdf86fd5d-fcflr      6/6     Running     0          79m
Screen Shot 2020-03-23 at 1 03 36 PM

After that, I executed a load test with a Swift project, and it was successful. I noticed the UI presenting a "Connected" status, which is expected with the latest builds.

Screen Shot 2020-03-23 at 1 07 37 PM

Then, I ran a load test with a Spring project, and it seemed to start successfully.

Screen Recording 2020-03-23 at 1.07.55 PM.mov.zip

At this time, I wanted to test if a new test with another project will trigger that a load test is currently in progress. I wanted to check that because before, I didn't notice the "A load test is in progress" message, and I informed Toby about that.

I proceeded to run a load test with a Node JS project. The UI was in a "Connected" status, but I didn't get a message indicating that I still have a Spring test in progress. Therefore, my Node JS load test got stuck in "Requested". After waiting, my Spring test was successful.

Screen Shot 2020-03-23 at 1 11 44 PM

Then, I executed a load test with a MicroProfile project. It seemed to start successfully, but I didn't get any results.

Screen Recording 2020-03-23 at 1.11.53 PM.mov.zip

Screen Shot 2020-03-23 at 1 17 16 PM

Next, I started another test with the same MicroProfile project. The UI status was "Connected", but my test got stuck in "Requested".

MicroProfile-Test-2.mov.zip

Screen Shot 2020-03-23 at 1 20 57 PM

After waiting a while, I closed all my open tabs but still couldn't run the load test with the same MicroProfile project—however, this time, I received the error message saying that another test was in progress.

Screen Recording 2020-03-23 at 1.56.35 PM.mov.zip

It is important to notice that the UI status has always been "Connected".

Screen Shot 2020-03-23 at 3 01 38 PM


2. VS Code (Remote): I installed the latest builds of Codewind 0.10.0. After that, I created a new remote instance with the latest cwctl. Then, I waited until at least one of the projects was in "Running" state.

Screen Shot 2020-03-23 at 1 11 01 PM

Screen Shot 2020-03-23 at 1 15 19 PM

Consequently, I tried to run a load test with a Node JS project, and the test got stuck in "Requested". Refreshing the browser didn't work either waiting for a long time before making another request.

Remote-Node-Test-1_P1.mov.zip

Remote-Node-Test-1_P2.mov.zip

Screen Shot 2020-03-23 at 1 29 58 PM

After that, I ran a load test with a MicroProfile project and it was successful.

Screen Shot 2020-03-23 at 2 51 29 PM

Then, I executed a load test with a Swift project and it was successful.

Screen Recording 2020-03-23 at 1.40.29 PM.mov.zip

Screen Shot 2020-03-23 at 2 53 16 PM

Then, I tried a load test with the Spring project and it got stuck on "Requested". I couldn't retrieve the test results even after waiting a while.

Screen Recording 2020-03-23 at 1.35.19 PM.mov.zip

Screen Shot 2020-03-23 at 2 56 10 PM

It is important to notice that the UI status has always been "Offline".

Screen Recording 2020-03-23 at 1.37.20 PM.mov.zip


Logs: che-codewind-workspace7wd59uozf2kus44z-b588fd594-vskjr.txt che-codewind-performance-workspace7wd59uozf2kus44z-5846fccdf7-22khk.txt

vscode-remote-codewind-pfe-k84qbm0y-768dfbfcb5-xz5hv.txt vscode-remote-codewind-performance-k84qbm0y-6c749fdf89-cbfh8.txt

tobespc commented 4 years ago

Thanks for the update, will pick this up asap in the morning

sujeilyfonseca commented 4 years ago

Thanks, @tobespc!

tobespc commented 4 years ago

After more debugging, we are still not any closer but... I have some points for discussion

johnmcollier commented 4 years ago

@elsony asked me to reproduce - As described in the issue, I can reproduce but only when loadtesting against multiple projects at once

jgwest commented 4 years ago

@elsony likewise asked me to reproduce... I wasn't able to reproduce using traditional methods (I did not try multiple projects), but I was able to "reproduce" using the following: 1) Create a Node project, wait for it to build and start. 2) Open the Performance Dashboard for the project. 3) Disconnect from VPN (or disconnect from WiFi). The 'Offline' status at the top right is displayed. Wait ~10 seconds. 4) Reconnect to VPN(/WiFi). Immediately attempt to kick off a load test (don't wait for 'Connected' status).

You may see a TypeError dialog from Chrome, and the 'Requesting...' status will be displayed and never resolve. Refreshing the browser resolves the issue.

This simulates a network drop between the browser and the cluster; while I obviously wouldn't expect a user to reproduce using this specific set of steps, it might approximate what a user would encounter with a bad WiFi connection, for example.

For this I was testing with latest Codewind 0.10 on Che, within Chrome, on non-OpenShift Kube.

jagraj commented 4 years ago

@elsony asked me to reproduce - As described in the issue, I can reproduce but only when loadtesting against multiple projects at once

@johnmcollier - Did you verify this scenario with Eclipse Che or Hybrid Scenario.?

johnmcollier commented 4 years ago

@jagraj Hybrid.

sujeilyfonseca commented 4 years ago

As you can see in the above scenarios where this problem has been reproduced, this can happen by accessing the Performance Dashboard using a single supported project.

https://github.com/eclipse/codewind/issues/2284#issuecomment-602805428 https://github.com/eclipse/codewind/issues/2284#issuecomment-601774669

To summarize the steps that were performed:

  1. Install Codewind 0.10.0
  2. For remote scenarios:
    • Use the recently installed 0.10.0 cwctl to deploy Codewind remotely
    • Create a new remote connection
  3. Create a new image registry
  4. Create some projects (e.g., Node, Swift, MicroProfile, Spring)
  5. Wait until at least one project is in "Running" state
  6. Run a load test

General notes:

  1. If your first load test was successful, try some additional tests as the problem may be intermittent, but more than one user has encountered it.
  2. This problem has been reproduced in Eclipse Che, VS Code (Remote), and Eclipse (Remote). However, it can be reproduced more often in hybrid.
  3. For some reason, we can reproduce it immediately after the first attempt to run a load test in hybrid. However, it may happen after successfully retrieving load test results (see above comments).
  4. Creating a load test when another test is being executed can trigger the issue. Currently, we are allowing users to open more than one Performance Dashboard tab and send more than one request.
  5. Losing the connection will trigger the issue.
  6. 4 and 5 are ways to get the same problem, as noticed in previous comments. However, as mentioned before, it may even occur in your first attempt(s) to get load test results. You can also fail to get test results with a project that retrieved results before.
  7. The new dashboard statuses were not always useful, as load test results were obtained with an "Offline" status, and the error also happened with a "Connected" status.
markcor11 commented 4 years ago

Actually the new dashboard status is useful. Offline indicates that the socket was not connected to PFE. It is over that socket channel that PFE drives the status button in the UI changing the state you see from Requested to Preparing, Running, Done etc.

If the dashboard socket is not connected to PFE, PFE is unable to control the button status which leaves it in the last known state of "Requested".

The action of changing the button from "Requesting" to "Requested" is performed over a REST request from Dashboard->PFE and not over the socket channel.

That is why the load run will actually complete because PFE did get told to start a load run (over REST), but the button still says "Requested" because the socket was offline and didn't get told to change to "Running" or "Completed"

I do have a code change that will blocking access to the 'Run Load Test" button and all controls on the page until the socket reconnects which will make it clear "all bets are off". However I have not activated that block yet whilst troubleshooting. More importantly is understanding why the socket won't re-connect. My gut feeling at the moment is that the security token has expired in the browser and is not being refreshed by the UI when its idle. To solve that, we may need to re-direct the user back through the login screen when the browser tab is bought back into focus.

malincoln commented 4 years ago

moving to verify

sujeilyfonseca commented 4 years ago

I retested this issue with the latest Codewind 0.10.0. I covered scenarios with Eclipse Che, using a Fyre OCP 4.3 cluster and a Google Chrome browser on Mac; and VS Code (Remote) using a Windows VM and a Firefox browser. Here are my findings:

1. VS Code (Remote): I installed the latest Codewind 0.10.0. After that, I created a new remote instance with the latest cwctl. Then, I created four default projects (MicroProfile, Swift, Spring, and Node) and waited for the projects to be in "Running" state.

Screen Shot 2020-03-26 at 12 24 47 PM

Consequently, I tried to run a load test with a WebShere Liberty MicroProfile project.

Screen Shot 2020-03-26 at 12 03 40 PM

After that, I opened another tab, and I was not allowed to do anything. This behavior is expected with the new modal window.

Screen Shot 2020-03-26 at 12 03 48 PM

I closed that second window and waited for my current load test in my first tab. My load test finished, but I didn’t get any results. Eventually, the "Run Load Test" button was enabled again, so I used it to run another test with the same project, and it got stuck in “Requested”.

Screen Shot 2020-03-26 at 12 07 43 PM

From now on, I can't start any new request because each new window is blocked from doing anything due to an "Offline" status.

Screen Shot 2020-03-26 at 1 20 26 PM


2. Eclipse Che: I deployed a new Codewind 0.10.0 workspace and proceeded to create four default projects (MicroProfile, Swift, Spring, and Node) and waited for the projects to be in "Running" state.

After that, I ran a load test with a WebShere Liberty MicroProfile project.

Screen Shot 2020-03-26 at 12 25 31 PM

Then, I opened another tab, and I was allowed to interact with it, but I didn't do anything and closed it.

Screen Shot 2020-03-26 at 12 26 29 PM

I waited for my current load test to finish, but I didn’t get any results. Eventually, the "Run Load Test" button was enabled again, so I used it to run another test with the same project, and it got stuck in “Requested”.

Screen Shot 2020-03-26 at 1 08 07 PM Screen Shot 2020-03-26 at 1 08 47 PM

From now on, I can try new load tests, but they get stuck in "Requested".

Screen Shot 2020-03-26 at 1 19 07 PM


Logs: vscode-remote-codewind-pfe-k88xliix-899d98d5-tjkcs.txt vscode-remote-codewind-performance-k88xliix-588c5d8859-lsdrm.txt

che-odewind-workspace1zycwn4fbqw199tr-55b8788684-8znpr.txt che-codewind-performance-workspace1zycwn4fbqw199tr-864d655fc-jgjd9.txt

jagraj commented 4 years ago

With the leatest fix, I executed load test in both "Hybrid" and "Che" environments.

1. Hybrid/Remote

Plugin : Eclipse Browser : Chrome OCP version : 4.3 (Fyre)

I created four projects Default (WebSphere liberty, Spring) and Appsody (Node.js RED, Node.js Scaffold). After WebSphere liberty project started running, I started load test and the request submitted successfully and load test started running. While this test was running then I open another performance window for Appsody Node.js Scaffold and I got "Offline dialog" which is expected based on latest fix. The message says offline though but it should say another test is in progress and can not be run at this time. I did wait until load run complete for WebSphere liberty project but the dialog never goes away and it always stays as "Offline" even though I click on "Refresh" button. I did close all windows and opened new "Performance dashboard" window for Appsody Node.js Scaffold and I always get "Offline" dialog.

Here are my screenshots and logs for this issue.

Screen Shot 2020-03-26 at 1 11 36 PM Screen Shot 2020-03-26 at 1 11 49 PM Screen Shot 2020-03-26 at 1 12 38 PM

image

gate-keeper.txt performance.txt pfe.txt

2. Eclipse che

Browser : Firefox OCP version : 4.3 (Fyre)

I see same behavior as Sujeily reported in the previous comment, the status get stuck as "Requested" and I do not see "Offline" dialog at all in Che scenario and the status says always "Connected". The other issue we noticed that we are not getting results back for the load test in che.

image

image

performance.txt pfe.txt

jagraj commented 4 years ago

@elsony Based on our conversation, can someone from your team try this latest fix in your environments as well.?

elsony commented 4 years ago

yes, we are testing it out. We'll post the result when available.

jgwest commented 4 years ago

Codewind 0.10.0 on Che, on generic Kubernetes, with Chrome, never load testing more than one application at a time:

But, not sure if this is expected behaviour (I presume not for the latter items):

DavidG1011 commented 4 years ago

My findings with Che on OKD 4.2:

  1. Created one of each: MicroProfile, Swift, Spring, and Nodejs. Waited for all of them to reach Running, Build succeeded status.

  2. Ran a load test with the MicroProfile project. Test completed successfully and result was shown. After a brief pause, I started another load test, and the load test was stuck in the "Requested" state.

  3. Attempted to cancel the load test, but that appeared to get stuck as well.

  4. Deleted all 4 projects.

  5. Recreated 1 MicroProfile project. Waited for it to reach Running, Build succeeded status.

  6. Ran a load test. Load test got stuck in "Requested" state.

pfe.txt performance.txt

tobespc commented 4 years ago

found a possible culprit in the code..... investigating....

sujeilyfonseca commented 4 years ago

I retested this issue with the latest Codewind 0.10.0. I covered scenarios with Eclipse Che, using a Fyre OCP 4.3 cluster and a Google Chrome browser on Mac; and VS Code (Remote) using a Windows VM and a Firefox browser.

1. VS Code (Remote): I cleaned my environment. I proceeded to install the latest Codewind 0.10.0. After that, I created a new remote instance with the newest cwctl. Then, I created four default projects and waited for the projects to be in "Running" state.

Screen Shot 2020-03-27 at 9 01 56 AM

The “Offline” modal window was present for every project. It didn't get away even after waiting for a while. I was not able to run any load test.

Screen Shot 2020-03-27 at 9 02 06 AM Screen Shot 2020-03-27 at 9 06 53 AM Screen Shot 2020-03-27 at 9 07 41 AM Screen Shot 2020-03-27 at 9 07 17 AM Screen Shot 2020-03-27 at 9 18 50 AM

Refreshing the browser didn't work.


2. Eclipse Che: I deployed a new Codewind 0.10.0 workspace and proceeded to create four default projects and waited for the projects to be in "Running" state.

Screen Shot 2020-03-27 at 9 22 30 AM

First, I executed a load test with a Node project, and I successfully retrieved the load test results. When that test was in progress, I was not allowed to run a new request with another project, which is good. I received the error message saying that another test was in progress. I only tested this to ensure the fix was working. Then, I closed that tab and continued testing single projects/tabs.

Screen Shot 2020-03-27 at 9 20 12 AM Screen Shot 2020-03-27 at 9 04 24 AM

After getting test results with the Node project, I executed a load test with a Spring project, and I successfully retrieved test results.

Screen Shot 2020-03-27 at 9 00 26 AM

Then, I tried a MicroProfile project, and I didn’t get the load test results after waiting for the load test to finish. From now on, every project presents the error saying that another test is in progress.

Screen Shot 2020-03-27 at 9 20 12 AM Screen Shot 2020-03-27 at 9 09 53 AM Screen Shot 2020-03-27 at 9 08 59 AM


Logs: vscode-remote-codewind-pfe-k8a6kny4-66d7897c58-rtpzv.txt

che-codewind-workspace1jvpx4kuou075wqs-79c856dbc6-tlw6n.txt che-codewind-performance-workspace1jvpx4kuou075wqs-797c5764ff-pw2jk.txt


Note: Something is not right with the Websphere Liberty MicroProfile stack since I have not retrieved test results with that stack more than once. I can reproduce this behavior locally, which let me think this can be a separate issue.

However, if the user didn't retrieve results with a project, this will block subsequent tests. Previously, subsequent tests were blocked with the status "Requested" and now with an error indicating that another test is in progress.

jagraj commented 4 years ago

Hybrid/Remote:

With the latest fix without "Offline" dialog, we see some improvements better than last fix. Looks like WebSphere Liberty and Open Liberty projects load tests failing and not getting load test results and I also injected metrics explicitly to see if I can get results but I did not get results. When these projects fail and I no longer submit load tests for any other projects and it always says another load test is in progress.

Screen Shot 2020-03-30 at 7 15 34 PM

gate-keeper.txt performance.txt pfe.txt

local

On local also, WebSphere and Open Liberty projects does not produce results if we run load tests from performance dashboard.

image

image

performance.txt pfe.txt

sujeilyfonseca commented 4 years ago

I retested this issue with the latest Codewind 0.10.0. I covered scenarios with Eclipse Che, using a Fyre OCP 4.3 cluster and a Google Chrome browser on Mac; and VS Code (Remote) using a Windows VM and a Firefox browser. I see some improvements with the latest fix.

1. VS Code (Remote) I cleaned my environment. I proceeded to install the latest Codewind 0.10.0. After that, I created a new remote instance with the newest cwctl. Then, I created five default projects and waited for the projects to be in "Running" state.

Screen Shot 2020-03-30 at 7 27 14 PM

My first load test with a Spring project was stuck in "Requested". The status never changed, but after some time, I obtained test results.

Screen Shot 2020-03-30 at 7 26 51 PM

Screen Shot 2020-03-30 at 7 30 20 PM

After that, I executed additional load tests, and the status changed as expected. With these tests, I successfully retrieved results. When a test was in progress, I was not allowed to run a new request with another project, which is expected.

Screen Shot 2020-03-30 at 7 37 18 PM

I didn't obtain load test results with an Open Liberty project, but this didn't block subsequent tests.

Screen Shot 2020-03-30 at 7 42 41 PM

MicroProfile projects don't produce test results and block subsequent tests.

Screen Shot 2020-03-30 at 7 46 41 PM

Screen Shot 2020-03-30 at 7 46 52 PM

Screen Shot 2020-03-30 at 7 47 22 PM


2. Eclipse Che I deployed a new Codewind 0.10.0 workspace and proceeded to create five default projects and waited for the projects to be in "Running" state.

Screen Shot 2020-03-30 at 8 19 03 PM

I successfully executed and retrieved test results with some projects. When a test was in progress, I was not allowed to run a new request with another project, which is expected.

Screen Shot 2020-03-30 at 7 29 52 PM Screen Shot 2020-03-30 at 8 24 03 PM

I didn't obtain load test results with an Open Liberty project, but this didn't block subsequent tests.

Screen Shot 2020-03-30 at 7 44 37 PM

However, MicroProfile projects don't produce test results and block subsequent tests.

Screen Shot 2020-03-30 at 7 48 27 PM Screen Shot 2020-03-30 at 7 48 35 PM Screen Shot 2020-03-30 at 7 48 49 PM


Logs: vscode-remote-codewind-pfe-k8f393kl-64cbc5c8cf-22zxlc.txt vscode-remote-codewind-performance-k8f393kl-76f8756dd7-6dd2b.txt

che-codewind-workspace9vgevotcwcfimi5q-6dbb9f6cbd-bl4wh.txt che-codewind-performance-workspace9vgevotcwcfimi5q-7869cd6dff-tg94j.txt


Note: The default Open Liberty and WebSphere Liberty MicroProfile projects are not producing test results in local scenarios either.

DavidG1011 commented 4 years ago

My test results are similar.

Che:

I was able to run a few load tests successfully with Java Spring, Nodejs, and Swift projects. I then tested out a load test with a Java Microprofile project. The test did not produce any results, and I was unable to run any subsequent load tests.

Test running:

Screen Shot 2020-03-30 at 8 03 49 PM

Subsequent load test:

Screen Shot 2020-03-30 at 8 04 12 PM

che-performance.txt che-pfe.txt

Local:

I then tried running load tests locally and saw similar results.

Test ran the first time for Microprofile:

Screen Shot 2020-03-30 at 7 58 17 PM

Subsequent:

Screen Shot 2020-03-30 at 7 59 12 PM

I then attempted to run a load test for a Java Open Liberty type, and was able to, but with the same result as the Microprofile load test; received no results, and was unable to run subsequent load tests.

local_pfe.txt local_performance.txt

tobespc commented 4 years ago

you are all seeing different results to me :-(

tobespc commented 4 years ago

current situation is as follows

For microprofile I get the following

The command I am trying to run to find out the name of the healthcenter hcd file is

docker exec bash -c 'ls home/default/app/load-test/ | grep healthcenter*' for example

docker exec b7 bash -c 'ls home/default/app/load-test/20200403140623 | grep healthcenter*'

When run outside of codewind, I can see it working fine

Screenshot 2020-04-05 at 17 15 25

So what I want to try and do is grab that result in code, and then be able to do a single docker file copy rather than a whole directory.

@stalleyj This is my branch with the changes in so far https://github.com/tobespc/codewind/tree/findHCD

tobespc commented 4 years ago

Fix is in against 0.11 and I'll make the same fix to master. Further changes to master can be tracked in a separate issue

jagraj commented 4 years ago

@tobespc @markcor11 With the latest 0.11 images, some how the message we used to display to the users saying another load test is in progress does not appear any more. This message was blocking user for not to submit another request before and not now. After injecting metrics for OpenLiberty on che I was able to run the load the test. When I tried to submit another request when the load was running for another test and that one get stuck into "Requested" state for ever. I think we still need some improvements.

image

Logs: performance.txt pfe.txt

tobespc commented 4 years ago

@jagraj , ok, will continue working this morning on it

jagraj commented 4 years ago

@tobespc @markcor11 I tried latest fixes, I see some improvement. Now to we do show message to the user that "Another load test is in progress" when another test is in progress. The first test load run completed and went into "Running" state and did not come back to "Ready" state to submit another load test and it always complains another load test is in progress. I am also noticing this error intermittently when I try to access performance dash board "NotFoundError" 404 Endpoint Get /appmetrics-dash not found. Here are all the screenshots and logs from these tests. Even after test completed, the status stays as "Running".

Screen Shot 2020-04-09 at 12 39 44 AM Screen Shot 2020-04-09 at 12 41 18 AM Screen Shot 2020-04-09 at 12 32 50 AM Screen Shot 2020-04-09 at 12 30 54 AM Screen Shot 2020-04-09 at 12 28 34 AM

Logs performance.txt pfe.txt

tobespc commented 4 years ago

This issue is becoming a bit of a mess now, so let me try and clarify how things are working.

The original stop ship of being stuck in requested state has been resolved and as such I would like us to consider removing the stop-ship label, closing this item and having a new issue with just the current failing behaviour documented for discussion

I believe the behaviour you are seeing is limited to Microprofile and OpenLiberty style projects now.

For issues of appearing to be stuck in 'Running' state whilst the test has finished, this is how the messages currently work 1) load test is run and after a few changes the state goes to Running (xx) with a timer 2) When the time ends, that indicates the load has finished but the performance dashboard is still busy collecting the information generated. Thats why it stays in Running state (and it can stay there for quite some time). We have plans to change that in 0.12

Running load and collecting metrics is not an instance thing, a lot of processing is happening once the load finishes and we should bear this in mind (and make it clearer to Jane) that for some project types (mainly Java) this can take time

For the 404 error , that implies that the application is not running or the endpoint of apmetrics-dash is incorrect. How are you getting to page ? Thats a node project you are trying but is that an appsody Node project or codewind template ? Remember, the appsody node project must be including appmetrics module to work. This is a new issue and so can this be raised as such please