Hello! I have found what I believe to be a bug in lmql serve regarding the --layout option. Here is the scenario: I am on a shared host with 8 physical GPUs, and I have access to 4 of them at the moment. So CUDA_VISIBLE_DEVICES is 4,5,6,7, and I am running lmql serve with --layout 4x1. My expectation was that this would result in the appropriate GPUs being used; however, what I observed was that lmql serve grabbed GPUs 0-3. I did a bit of looking into the source code, and found that the LMTP layout code totally ignores CUDA_VISIBLE_DEVICES and just assumes that all GPUs on the system are available; I see on line 39 of lmtqp_layout.py that it is just running nvidia-smi and listing all GPUs, and then going from there.
Is this the way it is supposed to work? Would a PR that adjusted the behavior to first check CUDA_VISIBLE_DEVICES, and if set, using that to populate gpu_ids be welcome?
Hello! I have found what I believe to be a bug in
lmql serve
regarding the--layout
option. Here is the scenario: I am on a shared host with 8 physical GPUs, and I have access to 4 of them at the moment. SoCUDA_VISIBLE_DEVICES
is4,5,6,7
, and I am runninglmql serve
with--layout 4x1
. My expectation was that this would result in the appropriate GPUs being used; however, what I observed was thatlmql serve
grabbed GPUs 0-3. I did a bit of looking into the source code, and found that the LMTP layout code totally ignores CUDA_VISIBLE_DEVICES and just assumes that all GPUs on the system are available; I see on line 39 oflmtqp_layout.py
that it is just runningnvidia-smi
and listing all GPUs, and then going from there.Is this the way it is supposed to work? Would a PR that adjusted the behavior to first check CUDA_VISIBLE_DEVICES, and if set, using that to populate
gpu_ids
be welcome?