There are some issues with the judge and concurrency. First, the requirements: all judge requests (for submission) arise from different threads. The main concurrency problem is in the JudgeVMSS class, although the other two (AzureEvaluator and JugeVM) also have some issues (all in azureevaluator.py). That is because they were all designed without concurrency / multithreading in mind, and they do have some state that they rely on.
This state includes judgevmss_dict in AzureEvaluator, judgevm_dict in JudgeVMSS and tasks, free cpu and free memory in JudgeVM. These values should not be corrupted (i.e. made invalid) despite concurrent access.
Some issues that may arise (this is likely an incomplete list):
What will happen when a new request comes in while a VM is being deleted? We have to make we don't use the VM that is in the process of being deleted
What if we are starting a VM while another request comes in? Currently, the 2nd request immediately picks the starting VM (even though it's not ready), waits for it to connect, and when it does forward the request and occupy the resources. The initial (first) request then after this also tries sending it to the VM, but fails cause all of its resources are occupied already, raising an exception. This should take the resource usage of the VMs into account, as this would be fine if the VM can do two requests at the same time, but not if it can only do one. Furthermore, at what point do we actually create a new VM, and when do we simply add it to a queue and wait until VMs become available (this queue is not yet implemented either).
Some of this stuff can be used with locks (mutexes), but you should make sure you don't lock too much. Obviously, when actually submitting to a VM and waiting for results, the lock should not be acquired. Furthermore, when waiting for a new VM to be created, which can take around 5 minutes, the lock should not be acquired.
Currently, there's a branch called submit-lock in the JudgeQueuer repository. This branch adds a single change, which is to have a lock around the whole submission process of a VMSS. However, this makes it so that the process is very inefficient, as it only allows one submission at the same time (or at least, per machine type).
There are some issues with the judge and concurrency. First, the requirements: all judge requests (for submission) arise from different threads. The main concurrency problem is in the JudgeVMSS class, although the other two (AzureEvaluator and JugeVM) also have some issues (all in
azureevaluator.py
). That is because they were all designed without concurrency / multithreading in mind, and they do have some state that they rely on.This state includes
judgevmss_dict
in AzureEvaluator,judgevm_dict
in JudgeVMSS and tasks, free cpu and free memory in JudgeVM. These values should not be corrupted (i.e. made invalid) despite concurrent access.Some issues that may arise (this is likely an incomplete list):
Some of this stuff can be used with locks (mutexes), but you should make sure you don't lock too much. Obviously, when actually submitting to a VM and waiting for results, the lock should not be acquired. Furthermore, when waiting for a new VM to be created, which can take around 5 minutes, the lock should not be acquired.
Currently, there's a branch called
submit-lock
in the JudgeQueuer repository. This branch adds a single change, which is to have a lock around the whole submission process of a VMSS. However, this makes it so that the process is very inefficient, as it only allows one submission at the same time (or at least, per machine type).