cylc / cylc-ui

Web app for monitoring and controlling Cylc workflows
https://cylc.github.io
GNU General Public License v3.0
37 stars 27 forks source link

Prevent unnecessary subscription requests #388

Closed kinow closed 3 years ago

kinow commented 4 years ago

Describe exactly what you would like to see in an upcoming release

At the moment users may trigger unnecessary requests when visiting a URL in Cylc UI. The URL will trigger a VueRouter route/view. That view has its own lifecycle.

We have a few subscriptions being created in the created lifecycle of a view. A view can have multiple components. At the moment, we also have components that create subscription.

And right now, creating a subscription means an HTTP or WebSocket request to the GraphQL backend.

Given a URL visited may trigger 1 created method of a view, and N created methods in each of its components, we may need to re-think how this is implemented.

Additional context

An initial workaround was suggested here https://github.com/cylc/cylc-ui/pull/373

Pull requests welcome!

kinow commented 4 years ago

If you open the Dashboard, while you have the WebSocket connection and are looking at the messages, you should see that it immediately fires 4 requests.

  1. connection-init
  2. op=1 payload..., which is the first request from GScan (components are created first for a View)
  3. op=1 stop (now the View is subscribing, so it's requesting that the current subscription [actually a JS Observable object] to be stopped)
  4. op=2 payload..., which is the second request from Dashboard, now with a query that combines the previous query of GScan, with the Dashboard query.

After the step #4, the UI works as expected. Messages are sent/received by client/server with operation=2. If the user navigates to another view, it will probably use operation 4, then operation 6, and so it goes.

Always with 1 extra operation being discarded, even though the messages was sent via the stream/socket.

hjoliver commented 4 years ago

From chat with @oliver-sanders at afternoon tea, while I remember it:

Subscription requests (which get combined into a single actual subscription in the UI "workflow service") come from views (and/with gscan sidebar). Only the "workflows" (lumino tabs) view will generate multiple subscription requests at once (or nearly at once) ... and it knows how many tabs are open, so it should be able to combine its own subscription requests into a single one (TODO) to pass on to the data service. Thereafter subscriptions should change only in response to user interaction with a workflow, which is to say slowly. If I understand correctly, you're mainly concerned about the initial gscan-only request? We thought it doesn't really matter (it might even be good) if the gscan sidebar loads quickly a little before the workflow view does. And if we combine multiple tab requests into one as just suggested, then there won't be another multi-request issue there when multiple tabs load at once. So maybe there's not really a problem here (and we could close this) - or have I missed something? (Debounce would normally be used for something like window resizing that generates a large number of calls quickly, wouldn't it ... although I know you closed that one already ... ).

kinow commented 4 years ago

Subscription requests (which get combined into a single actual subscription in the UI "workflow service") come from views (and/with gscan sidebar).

I think there may be some confusion here. Sorry for not explaining better.

When I say subscription, I mean the actual WebSocket subscription and the JS Observable object returned by ApolloClient#subscribe. That subscription has a link (WebSocketLink) using TCP over HTTP, and sending and receiving messages.

What is getting combined are the queries:

https://github.com/cylc/cylc-ui/blob/c3f49f53ddfedbe318d5428b1123b1c1ccb7fd78/src/services/gquery.js#L88-L93

That's in the recompute method of the parent GQuery. The SubscriptionWorkflowService replaced the polling service. In the polling service, it would start a timeout and query every 5 seconds (NB: you could still have issues with that one, if subscribing just before the next execution...). Now, the SubscriptionWorkflowService sends a request every time the query is updated.

https://github.com/cylc/cylc-ui/blob/c3f49f53ddfedbe318d5428b1123b1c1ccb7fd78/src/services/workflow.service.js#L45

So in other words, for every subscription, a new request is initiated. If I call SubscriptionWorkflowService#subscribe from GScan.vue, that's a new HTTP request. If later in the lifecycle of the application & components Dashboard.vue creates a new subscription, that will call recompute too, that will create a new HTTP request. So 2 HTTP requests.

It could be a matter of changing the logic of the SubscriptionWorkflowService, and calling request separately. But where would that happen?

If a view doesn't have any query to subscribe, but has at least one component like GScan, then it would never know that it needs to call SubscriptionWorkflowService.request.

If a page has a query to subscribe, but has other 10 components, then who should call SubscriptionWorkflowService.request?

That was why I created that issue, to decide where we should get the queries executed. We have always used the workflow service without having a well defined place where the query should actually be executed.

Only the "workflows" (lumino tabs) view will generate multiple subscription requests at once (or nearly at once) ... and it knows how many tabs are open, so it should be able to combine its own subscription requests into a single one (TODO) to pass on to the data service.

Agreed. If the SubscriptionWorkflowService.request is called once (see above), then this could work.

Thereafter subscriptions should change only in response to user interaction with a workflow, which is to say slowly.

Or if the user interact with the UI? If I have 5 workflows. And I quickly (but not quick enough to prevent the view from loading) click on each of them, but I want to see only the last one. Should it create one subscription for each, discarding all and using only the last one?

Or should it wait for me and execute just the last one?

If I understand correctly, you're mainly concerned about the initial gscan-only request? We thought it doesn't really matter (it might even be good) if the gscan sidebar loads quickly a little before the workflow view does.

In a normal application with multiple requests, that would be my inclination too. Send an Axios HTTP request, and fetch the data for that component as soon as possible.

But from what I understood, the idea of merging queries was exactly to avoid multiple requests. In the case of GScan I thought it would be the same?

For example, say I am now able to save my Lumino widgets that are open. So I have 3 tabs with Dot view, Graph view, Tree view. Plus the GScan. And say that I have shared this setup with 10 other users and we all use the UI at the same time.

So 1 query for the GScan (that's the only way for the component to appear sooner, it needs the data before), then +1 query (Dot/GScan/Tree queries merged).

Now there will be 10 users sending the same 10 queries. Which should be OK. If the GScan generates one extra query per user, that's 20 queries, which creates an unnecessary load on the server.

I would expect the GScan query to use the GQuery tooling, and merge its query as well. Doing so, it should still be created and mounted before the page, but not necessarily displayed — because its View, the one bound to VueRouter, may or may not display it, depending on how it was coded.

And if we combine multiple tab requests into one as just suggested, then there won't be another multi-request issue there when multiple tabs load at once. So maybe there's not really a problem here (and we could close this) - or have I missed something?

I jumped the gun adding debounce to solve the multiple queries issue. I thought about adding it later... but maybe it's not worth even later as it seems it would need a long discussion, and I seem to be the only one arguing for it (I think I'm not being very good at creating arguments to convince others).

(Debounce would normally be used for something like window resizing that generates a large number of calls quickly, wouldn't it ... although I know you closed that one already ... ).

That's one example for debounce. But there are many of other cases where it may be useful.

At NIWA there are projects using debounce (private repos / SVN). And our user base is not that huge. But debounce can give a good user experience.

If you run vue ui, it starts the vue-cli UI application. A local application, with probably 1 user. Still, for user experience I believe, they use debounce too - https://github.com/vuejs/vue-cli/blob/f1807fd304cc07c1b3707c2491e584f9ee079ab8/packages/%40vue/cli-ui/src/components/project-create/ProjectCreate.vue#L590

My understanding of debounce is a technique, not exclusive to JS (Java uses similar technique too, the AWS API has a lot of Throttling that demands it), that you can use to prevent any expensive operation of being executed unnecessarily by a user.

GQuery.recompute could be debounced. SubscriptionWorkflowService.request could be debounced. It would be cheap, no performance penalty, little maintenance overhead. Some watch operations could be debounced (Martin did that for the Graph.vue component, and I thought that was a good call).

Another example: right now, if a user clicks on the link to the Dashboard, but changes his mind, then clicks on the link to Suite A, then realizes the needed to see Suite B... that may trigger:

And from that point onwards the subscription data should flow from the server to the client as expected.

If we use debounce that would be:

In the first case, the UI may become a bit slow as the browser may have to wait for the WebSocket to run, or because it brought some data that fired Vue's reactivity. Which I believe is bad for UI/UX.

In the second case, the UI simply displays Suite B.

If someday we have issues with performance, others may find debounce useful. Example use cases:

And other implementations besides the well known from lodash:

And right now, in our code base, we have the following using debounce:

Normally functions are debounced at a rate of 100 - 500 ms (see doherty law).

ps: some apps use the name throttle, but actually implement the debounce similar as the one from lodash. There was some confusion years ago, but since then people have been distinguishing both, especially visually with demos like this.

kinow commented 4 years ago

(ps: the debounce discussion is not necessary here, it's not in this issue any more, and I've closed the PR already :+1: the Graph.vue component is still using it but that may be removed when we rewrite it anyway)

hjoliver commented 4 years ago

@kinow -

(I think I'm not being very good at creating arguments to convince others).

No, I'm sure it's just that I have to keep asking slightly ill-informed questions until I get my conceptual understanding calibrated (so I can compare it with what I thought we needed) because I don't have your expert-level grasp of JS UI coding - sorry if that's hard work sometimes! :grimacing:

In this case, I was genuiinely struggling to see that there was a practical need for debounce (or some other solution) on our subscription requests for the following reasons:

However your example of clicking quickly through a series of suites in gscan is an interesting one! ... maybe that would create a problem (so you may in fact have convinced me!) (Also, I'll read your examples above again when I get in to work...)

oliver-sanders commented 4 years ago

Clicking through a list of suites is a reasonable use case for debounce, though the user would need to change suite within the debounce period (i.e. within 0.5 seconds)

hjoliver commented 4 years ago

(ps: the debounce discussion is not necessary here, as...

(I may have glossed over that point, sorry!)

kinow commented 4 years ago

The reason why I ended up doing it, is because I was working on the WebSockets issue, with breakpoints everywhere (JS & Python). Opening the UI, I would need to skip 1 breakpoint in JS, then 1 breakpoint in Python. Clicking on five and then families2 would also fire the two requests 😞 and I was trying to keep track and understand the WebSocket messages where it said things like operation 1, operation 2, etc.

I think later when we have more users we can analyze the impact of our queries and the UI requests on the servers (possibly after the monitoring/logging issues are done too). That way we all would be more confident that that is - or is not - a good option.

On this issue we can try to find a solution for the multiple subscription requests.

@hjoliver

because I don't have your expert-level grasp of JS UI coding - sorry if that's hard work sometimes!

Definitely not expert level. If you see the mess I get Cylc UI when working locally on some new changes. Some of that gets fixed, but eventually a few silly bugs slip in in PR's too 😬

hjoliver commented 4 years ago

but eventually a few silly bugs slip in

Well, you're "only human" ... never seen anyone that didn't let a few bugs slip in :grin:

kinow commented 4 years ago

(ps: the debounce discussion is not necessary here, as...

(I may have glossed over that point, sorry!)

Not a problem. We can revisit that later if necessary.

We can simplify the test scenario here:

  1. open Cylc UI
  2. press F12, go to the Network tab
  3. filter all types of requests, leaving only "WS" (WebSockets)
  4. reload the page
  5. click on the one that should say "101, GET, localhost:8000, subscription, websocket, etc..."
  6. on the right panel, click on "Messages"
  7. scroll all the way up in case other messages have happened
  8. we should have 1 subscription for this URL, but we have 2

image

The workflow service merges queries. But the way it's working, the SubscriptionWorkflowService, there are 2 GraphQL requests for that page. Even though the queries were merged for the second one, we had an unnecessary GraphQL request sent to the server.

👍 There was a similar problem with the polling too, where it would fire unnecessary requests depending on when components were created and when the query was merged. But most likely my mistake, as I was the one to port from the polling workflow service, to a subscription workflow service.

oliver-sanders commented 4 years ago

But the way it's working, the SubscriptionWorkflowService, there are 2 GraphQL requests for that page

Should be a really simple change to make that one.

kinow commented 4 years ago

(The id in the message is from the GraphQL-WS protocol, and identifies the operation... on the server side, there's a Python Queue object, storing every message... then in another place graphql-ws keeps tabs of the operations and uses resolvers & schema to process the GraphQL request and prepare a JSON reply... if the user cancels the request too soon, there may still be a GraphQL processed in the server, that is then discarded and not wired down to the user... so you never see a reply, but it was still processed by the UI Server)