Closed ndronen closed 11 months ago
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Describe the bug
The instructions for launching the Steer-LM eval server in the HuggingFace model card are incorrect.
Steps/Code to reproduce bug
LLAMA2-13B-SteerLM.nemo
.python megatron_gpt_eval.py gpt_model_file=LLAMA2-13B-SteerLM.nemo trainer.precision=16 server=True tensor_model_parallel_size=4 trainer.devices=1 pipeline_model_parallel_split_rank=0
Expected behavior
The expected behavior is that the eval server starts or, if the system resources are insufficient, an error occurs.
Ideally, the model card will say how to run the eval server depending on the available GPU memory and number of GPUs. I'd like to be able to run this on a 4xV100 machine.
Instead, I see the following:
Environment overview
pip install -e .
inHEAD detached at v1.17.0
docker pull
&docker run
commands usedEnvironment details
Additional context
Example: 4xV100