ITISFoundation / osparc-simcore

🐼 osparc-simcore simulation framework
https://osparc.io
MIT License
43 stars 27 forks source link

Total number of anonymous users who are redirected to o²S²PARC #3412

Closed elisabettai closed 1 year ago

elisabettai commented 1 year ago

Created this issue to group all the info we have related to this metric.

In march, this was the Prometheus query used and associated graph. image Problem with the above is that the p2e tests were not subtracted (that was done manually)

elisabettai commented 1 year ago

Now the same query gives the following graph.

image

Did I understand well @mrnicegyu11 that you know what was the increase between Aug and Sep 8 and that you fixed something related to that? Sorry if you've explained that already yesterday, but I thought it's better to write it down here.

From which date do we have the "correct values"?

mrnicegyu11 commented 1 year ago

fyi I answered in a personal talk

elisabettai commented 1 year ago

From the 29th of September we have 3 series that reflect that we have 3 webservers, thanks @mrnicegyu11 for fixing that.

What made the values go up to 600k is still unclear to me. I am also assuming that the graph from January to March (red one in the description) is showing "correct values" (including p2e, though).

Does that graph mean around 150 redirections or something else (given the rate and sum), @sanderegg? Sorry if that's obvious.

sanderegg commented 1 year ago

@elisabettai I take you are refering to graph 1. Yes that is correct, rate docs "calculates the per second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for." so rate(http_requests_total(...([3h]) computes the # requests per second over the last 3 hours for each point in the time series. Since it's multiplied by a day (in seconds), we get the number of http requests per day. From there my guess is that the p2p tests are running about 150 times per day, which would be the baseline (it runs every 3 hours, so 8 times per day, in January March, I guess were about 9+1 sequential tests and 19 parallel test, so that would be 8 (9+1 + 9)=152 tests per day. I think that makes sense.

sanderegg commented 1 year ago

the sum part is because the webserver is restarted from time to time, and is therefore a new instance. so we sum them. I think with the fix of @mrnicegyu11 regarding the tasks, this should also apply. And I searched a bit yesterday and there as well we should sum the individual rates to get the grand total. This sadly went under the radar when we scaled the webserver up a few weeks/months ago. But basically from the 29th September this should be correct again.

elisabettai commented 1 year ago

@sanderegg gave me this PromQL query, which works on master to get very close to the "Total number of anonymous users who are redirected to o²S²PARC" and filters out the e2e test. sum(rate(http_requests_total{service_name="webserver", endpoint="/study/{id}", simcore_user_agent!="puppeteer"}[3h])*60*60*24) The graph on master looks like this: image

Looks good to me (value should be 0), not sure why it was not like that before the Oct. 5.

The change is not in production yet (it only reached staging_switzer1).

@mrnicegyu11, would you mind updating the grafana dashboards with that query ("Number of redirected anonymous users per day")? Should we wait that the required changes reach production?