ctreffe / alfred

Alfred - A library for rapid experiment development
MIT License
10 stars 1 forks source link

ListRandomizer Problems #205

Closed ctreffe closed 2 years ago

ctreffe commented 2 years ago

Currently, I think there are at least two problems regarding alfred's ListRandomizer, resulting in sessions ending with an exception and experiments not filling their randomizer slots as expected. I will discuss these problems in the same issue since I can not rule out that they are connected somehow.

In our latest set of online experiments, we see the following exception multiple times in our logfiles:

2022-06-08 21_40_37-Alfred Error Logentry

As far as I know, this has happened in every online experiment since the v2.3.2 patch for alfred. For my own experiment, I was able to trace the problem back to several experimental sessions starting at exactly the same time. From my experimental dataset and the log, I was able to identify 7 sessions starting in a span of 40 milliseconds. All of these sessions are trying to acquire a randomization slot at almost exactly the same time, which according to @jobrachem might lead to the above error message. The exception is a fallback for the improbable case that too many sessions try to access the randomization slot list at the same time. Of the 7 mentioned sessions, 5 exit the experiment with an exception, and only two sessions continue regularly.

Of course, it is highly unlikely, seeing the average number of sessions per day on the server (less than 100 for my experiment), that seven participants tried to start the same experiment at the same time, only 5 milliseconds apart. The underlying reasons for this seem to be performance issues in the university network, most likely with the outer firewall. Recently, I have run into a couple of performance issues (e.g., experiments taking a long time to start, pages loading with only 50 kb/s, javascript components not working because the script files are slowly loaded). These issues are mostly temporary and after a while (or a couple of hours), the performance normalizes. As far as I can tell from my logfile, the error might occur during such a performance bottleneck, during which multiple server requests build up (I suspect through one or more participants repeatedly trying to reload an experiment's landing page). I suspect the firewall since we've had similar issues with it in the past (remember when the firewall dropped some of the requests for resource files due to performance problems a couple of months back) and our server shows no indication of reaching anywhere near the maximum available resources (cpu, network, memory, etc. in the vm performance monitor). Also, I have seen some other network performance problems with vms yesterday and today (the above error is from yesterday).

In summary, I think we need to harden our Randomizer code against more sessions trying to access the randomization slot list at the same time, to prevent sessions from ending prematurely with the above exception.

In addition to the problem described above, I am convinced that the Randomizer still does not correctly manage randomization slots. While investigating the above problem in my experiment, I came across irregularities in the randomization. Before describing the problem itself, I will outline the general parameters of my current online experiment:

I am using a ListRandomizer.balanced() to assing two experimental conditions with 80 sessions in each condition. Hence, my experiment should remain open for new participants until I have acquired 160 finished datasets (inclusive set to False). In the experimental dataset, I am able to differentiate different types of unfinished sessions by the amount of stored data and some specific settings. I am calling the randomizer from my experiment's setup hook. Before initiating the randomization, I adjust the _exp_sessiontimeout, which is the only thing happening prior to the randomization. Hence, I can differentiate states in which a session either starts regularly, or is aborted early. Specifically, there are two relevant types of sessions:

  1. I have a relatively small amount of sessions (50 in 1000), for which a dataset exists, in which the _exp_sessiontimeout remains at its default value (86400 seconds). These sessions have been saved once before running the code in the setup hook, otherwise we should see the changes I made to _exp_sessiontimeout reflected in the saved dataset. In these datasets, some basic information is missing (e.g., client data and the starting time), while some critical initial information is present (e.g., save_time). These sessions have not received an experimental condition from the randomizer, at least there is no condition saved in the dataset.
  2. I have a large number of datasets with assigned experimental conditions and all the initial information, but with no user input whatsoever. These datasets seem to originate from participants closing the experiment directly on the landing page, never clicking on any button in the navigation. These datasets are saved at a later stage in the startup process, after finishing the experimental setup hook, since both my changes to the _exp_sessiontimeout setting and assigned experimental conditions are reflected in the dataset.

I assume, that the first type of dataset results from the randomizer aborting the session, but somehow this is not accurately reflected in the dataset (calling abort() should set the aborted flag to true, maybe create a log entry, and it should save a reason for the abortion of the session. None of this is happening). Moreover, the logged error message shown above can be traced to this type of session data (but not all of these sessions have an associated error message in the log). There are no other sessions in my experimental data that indicate a regular call to abort() by the randomizer. Also, these 50 sessions are not evenly distributed, but start to occur more often when the majority of randomization slots are already closed. Hence, I am convinced this type of session in the experimental data set reflects an irregular abortion by the randomizer. Interestingly, I am running a Matchmaker behind the randomizer, also in the session hook. When the MatchMaker aborts the experiment due to no available matching slots, the experiment is aborted normally, specifically, the aborted state is set to true and a reason for the abortion is saved in the dataset. Also, a log entry is created. Since both the randomizer and the match maker are run in the experimental hook, I think the problem is a specific randomizer problem.

In summary, one problem seems to be that the randomizer does not work as expected when aborting the experiment due to no available randomization slots. Maybe there is an exception that is not handled, yet? This could explain why we never see regular session abortions performed by the randomizer.

Moreover, I think there is another problem with the process of determining the number of open slots: As described above, my randomization should assign 160 participants to two experimental conditions, 80 participants in each condition. I have set the _exp_sessiontimeout to 7200 seconds instead of 86400 seconds. Hence, I would expect the randomizer to assign slots to new sessions until all slots are finished or all slots have ongoing datasets with saving times not older than two hours. My dataset, however, shows that sessions are aborted well before these criteria are met. Here is an excerpt from my randomizer slot list for the experiment:

{
                "label": "sanitized",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "sanitized",
                "session_groups": []
            },
            {
                "label": "sanitized",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            },
            {
                "label": "natural",
                "session_groups": []
            }

I do not see how the randomizer could abort a session when there are slots in the randomization list to which no session was ever assinged (hence, no ongoing session for this slot). Nontheless, sessions were not assigned an experimental condition and aborted prematurely, according to the experimental dataset.

I think all of the above problems occur, at least in part, because of the large number of unfinished sessions in our online studies. The amount of unfinished sessions blocking randomization (and match making) slots are somehow causing problems that we have not encountered while testing. In my experiment, there are 10 unfinished datasets for every finished dataset and I think we should test both match making and randomization more thoroughly under such circumstances.

jobrachem commented 2 years ago

I think it is most sensible to handle the three issues that you describe here in separate issues. See