[BUG] CUDA context error

Bug description

The code only runs on one GPU instead of multiple GPUs when it is on a .py file. When I use a jupyter notebook, there is no problem. It shows a warning: 2023-11-02 14:51:33,718 - distributed.comm.ucx - WARNING - Worker with process ID 3666900 should have a CUDA context assigned to device 1 (b'GPU-969c643a-e088-20fd-2b92-f8369b3da310'), but instead the CUDA context is on device 0 (b'GPU-6fbed52c-1fae-3eec-431d-dbc3c81e26a3'). This is often the result of a CUDA-enabled library calling a CUDA runtime function before Dask-CUDA can spawn worker processes. Please make sure any such function calls don't happen at import time or in the global scope of a program.

Code to reproduce bug

from merlin.core.utils import Distributed
from multiprocessing import freeze_support

if __name__ == '__main__':
    freeze_support()
    with Distributed():
        print('hi')

Environment details

Merlin version: 1.11.1
Python version: 3.8.0

NVIDIA-Merlin / Merlin