Why is nvidia-cuda-mps-control reporting one thing for memory and pytorch saying something else? This doesn't look right to me, but maybe I'm missing something. If I use MIG with an A100, the total_memory returned reflects the MIG instance as opposed to the total VRAM of the card.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
1. Quick Debug Information
2. Issue or feature description
I've configured MPS on an NVIDIA LS40 with 10 replicas.
As per the mps daemon logs, a default memory limit of ~4GB has been set.
However, if I look at this from the point of view of a client:
Only the
set_default_active_thread_percentage
of 10 is respected. Themulti_processor_count
changes from142
to14
.Here's some additional info from the application pod:
Why is
nvidia-cuda-mps-control
reporting one thing for memory andpytorch
saying something else? This doesn't look right to me, but maybe I'm missing something. If I use MIG with an A100, thetotal_memory
returned reflects the MIG instance as opposed to the total VRAM of the card.Additional information that might help better understand your environment and reproduce the bug: