Open HuXinjing opened 3 months ago
It seems I misunderstood the cuda version maga_transformer need, my nvcc version is cuda 11.8, but run_time is 12.2.
So, will maga_transformer-0.1.9+cuda118-cp310-cp310-manylinux1_x86_64.whl become available?
I have tried both cuda11 and cuda12 image in docker, but sudo sh ./create_container.sh rtp registry.cn-hangzhou.aliyuncs.com/havenask/rtp_llm:deploy_image_cuda12 or 11
told me docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
fairly, do I need install nvidia image before preform this script?
I have tried both cuda11 and cuda12 image in docker, but
sudo sh ./create_container.sh rtp registry.cn-hangzhou.aliyuncs.com/havenask/rtp_llm:deploy_image_cuda12 or 11
told medocker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
fairly, do I need install nvidia image before preform this script?
I installed nvidia/cuda:12.4.1-runtime-ubuntu22.04 nvidia-smi and it work.
But I got
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 82, in <module> main() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 76, in main return local_rank_start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 35, in local_rank_start raise e File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start app.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start self.inference_server.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start self._inference_worker = InferenceWorker() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in __init__ self.model = ModelFactory.create_from_env() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env model = ModelFactory.from_model_config(normal_model_config, sp_model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config model = ModelFactory._create_model(model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model model = model_cls.from_config(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config return cls(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in __init__ self.load() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load self._load_weights(self.config.ref_model, device) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 205, in _load_weights database = CkptDatabase(self.config.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 196, in __init__ ckpt.set_metadata(self._load_meta(ckpt.file_name)) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 237, in _load_meta with safe_open(file, framework="pt") as f_: FileNotFoundError: No such file or directory: "/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/model.safetensors"
after preforming TOKENIZER_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ CHECKPOINT_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 python3 -m maga_transformer.start_server
where /LLMs/... is mounted from my host. The tokenizer and model.saftensor are link , does it mean I need copy all weights into container?
But I got
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 82, in <module> main() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 76, in main return local_rank_start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 35, in local_rank_start raise e File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start app.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start self.inference_server.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start self._inference_worker = InferenceWorker() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in __init__ self.model = ModelFactory.create_from_env() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env model = ModelFactory.from_model_config(normal_model_config, sp_model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config model = ModelFactory._create_model(model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model model = model_cls.from_config(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config return cls(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in __init__ self.load() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load self._load_weights(self.config.ref_model, device) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 205, in _load_weights database = CkptDatabase(self.config.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 196, in __init__ ckpt.set_metadata(self._load_meta(ckpt.file_name)) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 237, in _load_meta with safe_open(file, framework="pt") as f_: FileNotFoundError: No such file or directory: "/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/model.safetensors"
after preformingTOKENIZER_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ CHECKPOINT_PATH=/LLMs/Qwen/Qwen2-1.5B-Instruct-MLX/ MODEL_TYPE=qwen_2 FT_SERVER_TEST=1 python3 -m maga_transformer.start_server
where /LLMs/... is mounted from my host. The tokenizer and model.saftensor are link , does it mean I need copy all weights into container?
I found link in my LLMs folders, so I mounted .cache/huggingface/ and it was solved.
But I got another! `[root][07/21/2024 08:51:36][start_server.py:local_rank_start():34][ERROR] start server error: module 'torch' has no attribute 'uint32', trace: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 254, in _load_layer_weight raise e File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 241, in _load_layer_weight tensor = self._load_and_convert_tensor(weight, ref_model=ref_model, layer_id=layer_id) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 399, in _load_and_convert_tensor before_merge_tensors.append(ckpt_weight.merge_fun(self.load_tensor(ckpt_weight.tensor_name(layer_id), datatype))) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 379, in load_tensor return self._database.load_tensor(name, datatype) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 252, in load_tensor tensors.append(self._load(name, ckpt_file, datatype)) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 275, in _load return f.get_tensor(name).to(datatype) File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1833, in getattr raise AttributeError(f"module '{name}' has no attribute '{name}'") AttributeError: module 'torch' has no attribute 'uint32' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 32, in local_rank_start app.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_app.py", line 38, in start self.inference_server.start() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_server.py", line 59, in start self._inference_worker = InferenceWorker() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/server/inference_worker.py", line 55, in init self.model = ModelFactory.create_from_env() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 173, in create_from_env model = ModelFactory.from_model_config(normal_model_config, sp_model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 71, in from_model_config model = ModelFactory._create_model(model_config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/model_factory.py", line 50, in _create_model model = model_cls.from_config(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 169, in from_config return cls(config) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 137, in init self.load() File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 192, in load self._load_weights(self.config.ref_model, device) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/models/gpt.py", line 213, in _load_weights self.weight = model_weights_loader.load_weights_from_scratch(num_process=load_parallel_num) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 61, in load_weights_from_scratch all_results = pool.starmap( File "/usr/lib/python3.10/multiprocessing/pool.py", line 375, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value AttributeError: module 'torch' has no attribute 'uint32'
multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar return list(itertools.starmap(args[0], args[1])) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 254, in _load_layer_weight raise e File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 241, in _load_layer_weight tensor = self._load_and_convert_tensor(weight, ref_model=ref_model, layer_id=layer_id) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 399, in _load_and_convert_tensor before_merge_tensors.append(ckpt_weight.merge_fun(self.load_tensor(ckpt_weight.tensor_name(layer_id), datatype))) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/model_weights_loader.py", line 379, in load_tensor return self._database.load_tensor(name, datatype) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 252, in load_tensor tensors.append(self._load(name, ckpt_file, datatype)) File "/usr/local/lib/python3.10/dist-packages/maga_transformer/utils/database.py", line 275, in _load return f.get_tensor(name).to(datatype) File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 1833, in getattr raise AttributeError(f"module '{name}' has no attribute '{name}'") AttributeError: module 'torch' has no attribute 'uint32' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/maga_transformer/start_server.py", line 82, in
I think I have no ability to figure it out
could you please check where the compute type is set as uint32
I'm preforming the instruction in Startup example, and
pip3 install -r ./open_source/deps/requirements_torch_gpu_cuda12.txt
completed successfully, but when I installed maga_transformer-0.2.0+cuda121-cp310-cp310-manylinux1_x86_64.whl an ERROR occurred:Processing ./maga_transformer-0.2.0+cuda121-cp310-cp310-manylinux1_x86_64.whl Requirement already satisfied: filelock==3.13.1 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.13.1) Requirement already satisfied: jinja2 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.1.4) Requirement already satisfied: sympy in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.13.1) Requirement already satisfied: typing-extensions in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (4.12.2) Requirement already satisfied: importlib_metadata in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (8.0.0) Requirement already satisfied: transformers==4.39.3 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (4.39.3) Requirement already satisfied: sentencepiece==0.1.99 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.1.99) Requirement already satisfied: fastapi==0.108.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.108.0) Requirement already satisfied: uvicorn==0.21.1 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.21.1) Requirement already satisfied: dacite in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.8.1) Requirement already satisfied: pynvml in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (11.5.3) Requirement already satisfied: thrift in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.20.0) Requirement already satisfied: numpy==1.24.1 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.24.1) Requirement already satisfied: psutil in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (6.0.0) Requirement already satisfied: tiktoken==0.7.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.7.0) Requirement already satisfied: lru-dict in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.3.0) Requirement already satisfied: py-spy in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.3.14) Requirement already satisfied: safetensors in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.4.3) Requirement already satisfied: cpm_kernels in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.0.11) Requirement already satisfied: pyodps in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.11.6.1) Requirement already satisfied: Pillow in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (10.4.0) Requirement already satisfied: protobuf==3.20.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.20.0) Collecting torchvision==0.16.0 (from maga-transformer==0.2.0+cuda121) Using cached torchvision-0.16.0-cp310-cp310-manylinux1_x86_64.whl.metadata (6.6 kB) Requirement already satisfied: einops in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.8.0) Requirement already satisfied: prettytable in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (3.10.2) Requirement already satisfied: pydantic==2.5.3 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (2.5.3) Requirement already satisfied: timm==0.9.12 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.9.12) Requirement already satisfied: sentence-transformers==2.7.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (2.7.0) Requirement already satisfied: grpcio==1.62.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (1.62.0) Collecting xfastertransformer_devel_icx==1.6.0.0 (from maga-transformer==0.2.0+cuda121) Using cached xfastertransformer_devel_icx-1.6.0.0-py3-none-any.whl.metadata (16 kB) Requirement already satisfied: decord==0.6.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.6.0) Requirement already satisfied: accelerate==0.25.0 in /home/jason/anaconda3/envs/inferenceFrame/lib/python3.10/site-packages (from maga-transformer==0.2.0+cuda121) (0.25.0) INFO: pip is looking at multiple versions of maga-transformer to determine which version is compatible with other requirements. This could take a while. ERROR: Could not find a version that satisfies the requirement torch==2.1.0+cu121 (from maga-transformer) (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1) ERROR: No matching distribution found for torch==2.1.0+cu121