eureka-research / Eureka

Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
https://eureka-research.github.io/
MIT License
2.84k stars 258 forks source link

Single Thread (Reward Function) Running on GPU #50

Open ziyingsk opened 1 month ago

ziyingsk commented 1 month ago

I encountered an issue where only one thread (the reward function) was successfully running on the GPU. After some investigation, I was able to resolve the problem.

The function set_freest_gpu() is a custom-written utility designed specifically for multi-GPU systems. In my case, I’m using a remote SSH GPU server, and the function automatically identified the "freest" GPU. However, this GPU was not actually allocated to me, which caused the following error:

RuntimeError: No CUDA GPUs are available

To partially resolve this, I used the following command to manually specify the correct GPU:

export CUDA_VISIBLE_DEVICES=0 # Replace 0 with your actual GPU ID

While this command helped in some cases, it's not a perfect solution since the issue is related to GPU allocation and visibility in shared environments.