As each child process uses a different lock directory, the result is each child process has no awareness of other child process resource locks. Each child process is thus able to claim the first resource which results in processing failure.
Testing the spawn start method is possible by adding the following to test_executor_resources.py after TestResourceProcess. However, this change currently results in a different error where the data dictionary is empty due to each spawned child re-importing the test module (e.g., simple_acquire is unable to find data[name])
Terra resource management is not compatible with the spawn start method for
ProcessPoolExecutor
. Current workaround is to use aThreadPoolExecutor
.torch appears to require the spawn start method for
ProcessPoolExecutor
https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessingThis can be set using the following torch code
Or setting
mp_context
when initializing theProcessPoolExecutor
Unfortunately, a spawned ProcessPoolExecutor will re-import python modules for each child process, meaning the resource lock directory is different for each child process due to the dependency on the
os.getpid()
https://github.com/VisionSystemsInc/terra/blob/e24792b8d0ec91f7c054c21930564ab3c586115e/terra/executor/resources.py#L126-L129As each child process uses a different lock directory, the result is each child process has no awareness of other child process resource locks. Each child process is thus able to claim the first resource which results in processing failure.
Testing the spawn start method is possible by adding the following to
test_executor_resources.py
afterTestResourceProcess
. However, this change currently results in a different error where thedata
dictionary is empty due to each spawned child re-importing the test module (e.g.,simple_acquire
is unable to finddata[name]
)https://github.com/VisionSystemsInc/terra/blob/e24792b8d0ec91f7c054c21930564ab3c586115e/terra/tests/test_executor_resources.py
Issue discovered by @decrispell during terra_real3d development, attempting to run multiple torch tasks each with a single assigned GPU.