LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).
Apache License 2.0
122 stars 9 forks source link

decoder.embed_tokens.weight.pt not found #10

Open llx-08 opened 2 weeks ago

llx-08 commented 2 weeks ago

Hi, when I using distserve_api_server.py to start a server, it always raise an error during lauching it:

ray.exceptions.RayTaskError(RuntimeError): ray::ParaWorker.init_model() (pid=129376, ip=172.17.0.7, actor_id=27f5ee672314d5dfcee7384701000000, repr=<distserve.worker.ParaWorker object at 0x7efea016a110>)
  File "/share/share/DistServe/distserve/worker.py", line 100, in init_model
    self.model.load_weight(path)
RuntimeError

When I shut it down, it will output the following error:

(ParaWorker pid=129375) Gpt<T>::load() - /llama-2-13b-hf/decoder.embed_tokens.weight.pt not found [repeated 5x across cluster]

The model is llama-2-13b-hf, I can run it in vLLM.

interestingLSY commented 2 weeks ago

May I have your startup command? I suspect that your model argument is incorrect.

llx-08 commented 2 weeks ago

I have found the cause that led to this issue. I discovered that the model weights need to be converted via the converter under the downloader. However, the function download_and_convert_weights returns immediately when encountering model path, without converting the weights. This led to the error where the weights could not be run.

KylinC commented 2 weeks ago

but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files

interestingLSY commented 2 weeks ago

but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files

You can use globs like /data/weights/*.bin

NOTE. In some shells you may need to wrap it in quotation marks to avoid being expanded by the shell.

KylinC commented 2 weeks ago

but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files

You can use globs like /data/weights/*.bin

NOTE. In some shells you may need to wrap it in quotation marks to avoid being expanded by the shell.

I convert the model like this:

python distserve/downloader/converter.py --input /data/Llama-2-7b/pytorch_model.bin --output /data/Llama-2-7b --model llama

then run the model_server like this:

python distserve/api_server/distserve_api_server.py --model /data/llama-2-7B/

but it's still missing something:

(ParaWorker pid=715665) INFO 13:35:56 runtime peak memory: 12.922 GB (ParaWorker pid=715665) INFO 13:35:56 total GPU memory: 39.392 GB (ParaWorker pid=715665) INFO 13:35:56 kv cache size for one token: 0.50000 MB (ParaWorker pid=715665) INFO 13:35:56 num_gpu_blocks: 2883 (ParaWorker pid=715665) INFO 13:35:56 num_cpu_blocks: 128 (ParaWorker pid=715665) Gpt::load() - /data/fx/cql/llama-2-7B/decoder.output_projection.weight.pt not found

interestingLSY commented 2 weeks ago

but how can I run converter.py with a local model dir? it seems the converter get a parameter '--input' to get 'model.bin', but I have several *.bin files

You can use globs like /data/weights/*.bin NOTE. In some shells you may need to wrap it in quotation marks to avoid being expanded by the shell.

I convert the model like this:

python distserve/downloader/converter.py --input /data/Llama-2-7b/pytorch_model.bin --output /data/Llama-2-7b --model llama

then run the model_server like this:

python distserve/api_server/distserve_api_server.py --model /data/llama-2-7B/

but it's still missing something:

(ParaWorker pid=715665) INFO 13:35:56 runtime peak memory: 12.922 GB (ParaWorker pid=715665) INFO 13:35:56 total GPU memory: 39.392 GB (ParaWorker pid=715665) INFO 13:35:56 kv cache size for one token: 0.50000 MB (ParaWorker pid=715665) INFO 13:35:56 num_gpu_blocks: 2883 (ParaWorker pid=715665) INFO 13:35:56 num_cpu_blocks: 128 (ParaWorker pid=715665) Gpt::load() - /data/fx/cql/llama-2-7B/decoder.output_projection.weight.pt not found

Paths and file names are case-sensitive under Linux. Please correct your path when running the API server.

KylinC commented 2 weeks ago

thx, I correct my spelling, but it seems missing decoder.output_projection.weight.pt

Gpt<T>::load() - /data/fx/cql/Llama-2-7b-hf/decoder.output_projection.weight.pt not found

I carefully check the conversion and its output files, they are three types:

decoder.embed_tokens.weight.pt decoder.layer_norm.weight.pt decoder.layers.*

but no file named decoder.output_projection,

KylinC commented 2 weeks ago

I have found the cause that led to this issue. I discovered that the model weights need to be converted via the converter under the downloader. However, the function download_and_convert_weights returns immediately when encountering model path, without converting the weights. This led to the error where the weights could not be run.

I am still missing decoder.output_projection.weight.pt not found,even I've converted weights

interestingLSY commented 2 weeks ago

Sorry but I am unable to reproduce your issue. I have tried the following steps:

May you check all your commands again?

llx-08 commented 2 weeks ago

I have changed the convert_weights function like this:

def convert_weights(
    input: str, 
    output: str, 
    dtype: torch.dtype, 
    model: str
) -> None :
    """Function used by `downloader.py` to convert weights"""
    os.makedirs(output, exist_ok=True)
    print(f"Converting {input} into torch.jit.script format")

    # Load the state dict (tensor_dict)
    # If the whole model is saved in a single file, then load the state dict directly
    # otherwise, load them separately and merge them into a single state dict

    bin_files = glob(input + '*.bin')
    # input_files = glob(input)
    # print("input files: ", input_files)

    if len(bin_files) == 0:
        ValueError(f"Input {input} does not match any files")
        print(f"Input {input} does not match any files")
        exit(1)

    # Load file(s)
    state_dict = {}
    for file in bin_files:
        print(f"Loading {file}")
        state_dict.update(torch.load(file, torch.device("cpu")))

    # Change dtype
    for key in state_dict:
        state_dict[key] = state_dict[key].to(dtype)

    # Preprocess
    print("Preprocessing")
    preprocessor = PREPROCESSOR[model]
    tensor_dict, num_q_heads, head_dim = preprocessor(state_dict)

    # The final step: divide the weights and save them to files
    print("Resharding and saving weights")
    name_translator = NAME_TRANSLATOR[model]
    divideWeightAndSave(output, tensor_dict, name_translator, num_q_heads, head_dim)

and manually convert before running server