`--layout` ignores CUDA_VISIBLE_DEVICES

Hello! I have found what I believe to be a bug in lmql serve regarding the --layout option. Here is the scenario: I am on a shared host with 8 physical GPUs, and I have access to 4 of them at the moment. So CUDA_VISIBLE_DEVICES is 4,5,6,7, and I am running lmql serve with --layout 4x1. My expectation was that this would result in the appropriate GPUs being used; however, what I observed was that lmql serve grabbed GPUs 0-3. I did a bit of looking into the source code, and found that the LMTP layout code totally ignores CUDA_VISIBLE_DEVICES and just assumes that all GPUs on the system are available; I see on line 39 of lmtqp_layout.py that it is just running nvidia-smi and listing all GPUs, and then going from there.

Is this the way it is supposed to work? Would a PR that adjusted the behavior to first check CUDA_VISIBLE_DEVICES, and if set, using that to populate gpu_ids be welcome?

eth-sri / lmql

`--layout` ignores CUDA_VISIBLE_DEVICES #333