Open llx-08 opened 2 weeks ago
May I have your startup command? I suspect that your model
argument is incorrect.
I have found the cause that led to this issue. I discovered that the model weights need to be converted via the converter
under the downloader
. However, the function download_and_convert_weights
returns immediately when encountering model path, without converting the weights. This led to the error where the weights could not be run.
but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files
but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files
You can use globs like /data/weights/*.bin
NOTE. In some shells you may need to wrap it in quotation marks to avoid being expanded by the shell.
but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files
You can use globs like /data/weights/*.bin
NOTE. In some shells you may need to wrap it in quotation marks to avoid being expanded by the shell.
I convert the model like this:
python distserve/downloader/converter.py --input /data/Llama-2-7b/pytorch_model.bin --output /data/Llama-2-7b --model llama
then run the model_server like this:
python distserve/api_server/distserve_api_server.py --model /data/llama-2-7B/
but it's still missing something:
(ParaWorker pid=715665) INFO 13:35:56 runtime peak memory: 12.922 GB
(ParaWorker pid=715665) INFO 13:35:56 total GPU memory: 39.392 GB
(ParaWorker pid=715665) INFO 13:35:56 kv cache size for one token: 0.50000 MB
(ParaWorker pid=715665) INFO 13:35:56 num_gpu_blocks: 2883
(ParaWorker pid=715665) INFO 13:35:56 num_cpu_blocks: 128
(ParaWorker pid=715665) Gpt
but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files
You can use globs like /data/weights/*.bin NOTE. In some shells you may need to wrap it in quotation marks to avoid being expanded by the shell.
I convert the model like this:
python distserve/downloader/converter.py --input /data/Llama-2-7b/pytorch_model.bin --output /data/Llama-2-7b --model llama
then run the model_server like this:
python distserve/api_server/distserve_api_server.py --model /data/llama-2-7B/
but it's still missing something:
(ParaWorker pid=715665) INFO 13:35:56 runtime peak memory: 12.922 GB (ParaWorker pid=715665) INFO 13:35:56 total GPU memory: 39.392 GB (ParaWorker pid=715665) INFO 13:35:56 kv cache size for one token: 0.50000 MB (ParaWorker pid=715665) INFO 13:35:56 num_gpu_blocks: 2883 (ParaWorker pid=715665) INFO 13:35:56 num_cpu_blocks: 128 (ParaWorker pid=715665) Gpt::load() - /data/fx/cql/llama-2-7B/decoder.output_projection.weight.pt not found
Paths and file names are case-sensitive under Linux. Please correct your path when running the API server.
thx, I correct my spelling, but it seems missing decoder.output_projection.weight.pt
Gpt<T>::load() - /data/fx/cql/Llama-2-7b-hf/decoder.output_projection.weight.pt not found
I carefully check the conversion and its output files, they are three types:
decoder.embed_tokens.weight.pt decoder.layer_norm.weight.pt decoder.layers.*
but no file named decoder.output_projection,
I have found the cause that led to this issue. I discovered that the model weights need to be converted via the
converter
under thedownloader
. However, the functiondownload_and_convert_weights
returns immediately when encountering model path, without converting the weights. This led to the error where the weights could not be run.
I am still missing decoder.output_projection.weight.pt not found,even I've converted weights
Sorry but I am unable to reproduce your issue. I have tried the following steps:
meta-llama/Llama-2-7b-hf
to some directorypython3 distserve/downloader/converter.py --input "PATH/TO/DOWNLOADED_WEIGHTS/*.bin" --output PATH/TO/OUTPUT --dtype float16 --model llama
. decoder.output_projection.weight.pt
.May you check all your commands again?
I have changed the convert_weights
function like this:
def convert_weights(
input: str,
output: str,
dtype: torch.dtype,
model: str
) -> None :
"""Function used by `downloader.py` to convert weights"""
os.makedirs(output, exist_ok=True)
print(f"Converting {input} into torch.jit.script format")
# Load the state dict (tensor_dict)
# If the whole model is saved in a single file, then load the state dict directly
# otherwise, load them separately and merge them into a single state dict
bin_files = glob(input + '*.bin')
# input_files = glob(input)
# print("input files: ", input_files)
if len(bin_files) == 0:
ValueError(f"Input {input} does not match any files")
print(f"Input {input} does not match any files")
exit(1)
# Load file(s)
state_dict = {}
for file in bin_files:
print(f"Loading {file}")
state_dict.update(torch.load(file, torch.device("cpu")))
# Change dtype
for key in state_dict:
state_dict[key] = state_dict[key].to(dtype)
# Preprocess
print("Preprocessing")
preprocessor = PREPROCESSOR[model]
tensor_dict, num_q_heads, head_dim = preprocessor(state_dict)
# The final step: divide the weights and save them to files
print("Resharding and saving weights")
name_translator = NAME_TRANSLATOR[model]
divideWeightAndSave(output, tensor_dict, name_translator, num_q_heads, head_dim)
and manually convert before running server
Hi, when I using
distserve_api_server.py
to start a server, it always raise an error during lauching it:When I shut it down, it will output the following error:
The model is llama-2-13b-hf, I can run it in vLLM.