Use CUDA_VISIBLE_DEVICES in "lmql serve" layout calculation

eth-sri / lmql

A language for constraint-guided and efficient LLM programming.

https://lmql.ai

Apache License 2.0

3.48k stars 191 forks source link

Use CUDA_VISIBLE_DEVICES in "lmql serve" layout calculation #335

Closed stevenbedrick closed 3 months ago

stevenbedrick commented 4 months ago

Fixes #333, a bug in which lmql serve ignored CUDA_VISIBLE_DEVICES when figuring out what to do with its --layout argument. I've tested this in a scenario where I had devices 0 and 2 on a given machine; without this fix, setting --layout 2x1 resulted in the lmql server worker processes being assigned to devices 0 and 1, which caused problems since somebody else was using device 1. With this fix, the right GPUs are used.

lbeurerkellner commented 3 months ago

Thanks, this looks good. I was just able to test it on a multi GPU system.