Open alondhe opened 5 months ago
Yes, one personal OKR for 2024 for me is addressing technical issues with Atlas/WebAPI, and transaction coordination (which may be the cause of these hangs) is something I'd like to look into.
As things hangs on you, do you see any messages in console (like 500 error responses) or if you view the Postgres Dashboard via pgAdmin do you see any idle transactions or table locks?
That's the frustrating thing! Nothing we've found in the Chrome console, WebAPI, nor PG.
I would add few thoughts on this after review with @alondhe
The only related stacktrace we saw in the WebAPI logs is that WebAPI was not able to clear the cohort cache because of timeout connection to OMOP datasource. It was temporay disconnection.
I assume, these steps happen with each "Save" action in the WebAPI:
I do not know exact order, it's just assumption.
And on the step # 2 WebAPI fails because of timeout and it was the reason of the whole "Save" process failure. I think we can try to reproduce it easily.
So, it might be related to #2334
Expected behavior
(Using WebAPI 2.14.0 / Atlas 2.14.1)
Cohort Definitions save cleanly every time.
Actual behavior
When updating a cohort definition, with even small changes like the name, we see sporadically the save gets stuck.
This is hard to pin down, as it's not consistent. We can't find anything in the WebAPI logs, nor anything in the Postgres logs. The chrome console shows the in-flight PUT command for saving the updated cohort stalls out.
Steps to reproduce behavior
Tagging @konstjar