Open sitabulaixizawaluduo opened 1 week ago
For llama2, you do not need to download the weights yourself. Just launch the api_server with --model meta-llama/Llama-2-7b-hf
(the name matches the official name on huggingface), distserve will download and convert the weights for you.
For llama2, you do not need to download the weights yourself. Just launch the api_server with
--model meta-llama/Llama-2-7b-hf
(the name matches the official name on huggingface), distserve will download and convert the weights for you.
Is there a difference between the two method? the LLama model which I used is also download from huggingface
You may refer to the downloader code to see if you have missed some details during convertering.
model: Llama-2-7b-hf step: 1、python3 converter.py --input "Llama-2-7b-hf/*.bin"--output /datasets/distserve/llama-7b --dtype float16 --model llama 2、python3 api_server/distserve_api_server.py --port 6902 --model /datasets/distserve/llama-7b --context-tensor-parallel-size 1 --decoding-tensor-parallel-size 1 3、python3 evaluation/2-benchmark-serving/0-prepare-dataset.py --dataset-path Sharegpt 4、python3 evaluation/2-benchmark-serving/2-benchmark-serving.py --port 6902
the error message: SwiftTransformer/src/csrc/model/gpt/gpt.cc:278 'cudaMemcpy(ith_context_req_req_index.ptr, ith_context_req_req_index_cpu, sizeof(int64_t) * batch_size, cudaMemcpyHostToDevice)': (700) an illegal memory access was encountered