Closed Tuna9129 closed 1 year ago
Here's what I've found out so far: the list of participants that's displayed by the dashboard is obtained from the server by issuing a POST request to the /
server endpoint. This POST request contains a string representing a (pretty complex) JSONata query containing the corresponding researcherId
.
For most researcherId
s, this query returns more or less quickly, but it seems that for some of them (such as yhe0wtfn6n6sbvsan0js
) the server hangs while trying to resolve it. After a while, the request times out, but the dashboard does not take notice of this; instead, it is still displayed as if it was loading, and goes on this way forever. So, to me there are two problems to be solved (in which I'm working on now):
Hello! Thank you very much for the update and explanation. That sounds great!
I've got an update on issue 1.
The JSONata query that is issued whenever the user tries to view a researcher in the dashboard basically asks the server to deliver a bunch of data of all the activities belonging to the researcher (the "all" word is important here). This data are organized in several fields. What I did was try to remove fields from the requests one by one to figure out if any of them is causing the problem.
What I found is that for this particular user the request failed unless I removed settings
from the set of fields that are retrieved. I don't know exactly about the nature of the data that it holds, but they seem like parameters and settings that tune the way the activity will behave. Here's an example of what the settings
field contains for a certain activity:
"settings": {
"bubble_count": [
60,
80,
80
],
"bubble_speed": [
60,
80,
80
],
"intertrial_duration": 0.5,
"bubble_duration": 1
}
Another thing I noticed is that this particular user has as much as 11980 activities! That is a ton of data to be downloaded all at once. So the next thing I did was to try lighter queries, say asking for the first 1000 activities, then the following 1000, and so. This way, the queries don't fail, but the responses from the server are very heavy, see:
[1st - 1000th] 200 286.4 MB
[1001st - 2000th] 200 273.1 MB
[2001st - 3000th] 200 264.5 MB
[3001st - 4000th] 200 259.7 MB
...
So to sum up I think the problem boils down to the queries being too heavy to handle when the researcher in question (like this one) has too many activities. I think this could explain similar situations with other researchers and also the server failing "randomly" (it might be related to server load; the more loaded the server is, the more difficult it is for it to handle these heavy queries).
I suspect that only a bit of what's being asked by the dashboard to the server in this query is actually necessary for the dashboard. In that case, the query could be simplified and the problem would be solved. Next thing I'm going to do is try to figure out which parts are actually necessary and then cut everything else out of the query.
@falmeida-orangeloops That’s a fantastic write-up and diagnosis. Thank you!
Another thing I noticed is that this particular user has as much as 11980 activities!
This seems very odd but does make sense…
@falmeida-orangeloops Could you try to snoop on the data and check?
@avaidyam Actually response sizes vary within a pretty wide range (from < 1 kB to > 2 MB)!
As an example, this is the size of the response for the first 50 activities (I can provide you with the full list of activity IDs that are being retrieved for some you to check if you'd like):
Turns out that all of the activities that are > 1 MB in size (at least among the ones I could check) have full Base64-encoded audio files embedded in them; that's why they are so heavy. The dashboard is trying to download all of them (probably hundreds or thousands) at the same time!
@falmeida-orangeloops That makes a lot of sense! If you mask out the particular field with the base64 data, does the total request size drop? What is the new total request size per 10000 events? Also, could you share what the spec field reports for the items that have this data?
@avaidyam If I mask out that audio
field, the total size of the response for all 11980 items is 18 kB, which is much more reasonable!
Also, could you share what the spec field reports for the items that have this data?
All 2010 of them have lamp.breathe
in the spec
field.
That's fantastic! Could you add a comment to the code explaining why we're masking out this particular field? In the future we will likely want to dynamically look up which fields are base64 data and mask them out instead of hardcoding it.
Could you add a comment to the code explaining why we're masking out this particular field?
Of course! I'll get back when it's ready.
@avaidyam I removed the audio
field from the query. It's working now 👌
Pull Request (just for your information): BIDMCDigitalPsychiatry/LAMP-dashboard#705
Two last comments:
settings
Activity field (the one containing the heavy audio
field along with others) is actually needed in this query; this entire field is re-queried whenever the researcher opens an activity. Removing settings
would further reduce the response size (plus hardcoding can be avoided), but just to be sure I only masked audio
out.ParticipantListItem
hierarchy. I can make a separate PR for this.Thanks! I think you can go ahead and remove settings
entirely. Also, could you use Error Boundaries for item 2?
I think you can go ahead and remove
settings
entirely
Nice!
Also, could you use Error Boundaries for item 2?
I'm not sure but I'll try that. Thanks!
Dashboard not loading for this user (and others with similar conditions) is solved by BIDMCDigitalPsychiatry/LAMP-dashboard#705.
Hello! Thank you for the updates. I have confirmed that the dashboards now load in the staging dashboard. It takes a few minutes, but it works great now!
Great to know!
I can dig more into reducing loading times if you think it's worthwhile. Just let me know.
It definitely would be worth it but perhaps @Tuna9129 could you share the Chrome DevTools Network log? It will help better gauge whether this is a network transfer issue or now a dashboard UI/loading issue (which would be new/separate).
Hello! I just tried again, and it seems a faster now. I'm not sure if we need to reduce loading times if we just have to wait like 2 minutes, but here is a screenshot of the Chrome DevTools Network log (I think)! I can also try to send a HAR? file if necessary.
@Tuna9129 Thanks for the info! I think it's still worth looking into, and great idea to share the HAR file - could you email that to @falmeida-orangeloops & co (cc @ertjlane) since it would have security credentials in it? I'd like to see if it's worthwhile to remove all the repeated calls to the 1
resource in the logs.
Okay, I'll do that! Thanks for your help!
Describe the bug An investigator/research account with the id 'yhe0wtfn6n6sbvsan0js' is not loading at all from the admin user. I have tried on different browsers (on a macbook pro) and my coworkers have tried as well with no luck. I have tried on both the regular and staging dashboard. It is stuck on the loading wheel and I cannot click on anything.
We have also observed that sometimes, other investigator/researcher accounts will take a very long time to load. But after waiting a few hours or a day, it will work fine. However, this study has not been accessible at all since early yesterday.
Past similar/related issues?: https://github.com/BIDMCDigitalPsychiatry/LAMP-platform/issues/491 https://github.com/BIDMCDigitalPsychiatry/LAMP-platform/issues/515 https://github.com/BIDMCDigitalPsychiatry/LAMP-platform/issues/625
To Reproduce
Expected behavior After clicking into the study, it should load within a few minutes.
Screenshots
Desktop (please complete the following information):
Additional context Just a video showing the issue. Won't load even if I left it open for 30 minutes. https://user-images.githubusercontent.com/105741216/206572589-45521fe9-9c6a-42e1-a663-25d9d9357024.mov