SkunkworksAI / hydra-moe

410 stars 15 forks source link

Webui prototype #13

Closed fearnworks closed 1 year ago

fearnworks commented 1 year ago

image

7

pharaouk commented 1 year ago

I am facing this error when building the docker with webui

hydra-moe-webui-1 | Traceback (most recent call last): hydra-moe-webui-1 | File "/hydra-moe/server.py", line 127, in hydra-moe-webui-1 | moe.initialize_model() hydra-moe-webui-1 | File "/hydra-moe/moe.py", line 66, in initialize_model hydra-moe-webui-1 | model, tokenizer = get_inference_model(args, checkpoint_dirs) hydra-moe-webui-1 | File "/hydra-moe/moe_utils.py", line 52, in get_inference_model hydra-moe-webui-1 | model = AutoModelForCausalLM.from_pretrained( hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained hydra-moe-webui-1 | return model_class.from_pretrained( hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2903, in from_pretrained hydra-moe-webui-1 | ) = cls._load_pretrained_model( hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3260, in _load_pretrained_model hydra-moe-webui-1 | new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 725, in _load_state_dict_into_meta_model hydra-moe-webui-1 | set_module_quantized_tensor_to_device( hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/transformers/utils/bitsandbytes.py", line 99, in set_module_quantized_tensor_to_device hydra-moe-webui-1 | new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device) hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/nn/modules.py", line 178, in to hydra-moe-webui-1 | return self.cuda(device) hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/nn/modules.py", line 156, in cuda hydra-moe-webui-1 | w_4bit, quant_state = bnb.functional.quantize_4bit(w, blocksize=self.blocksize, compress_statistics=self.compress_statistics, quant_type=self.quant_type) hydra-moe-webui-1 | File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/functional.py", line 799, in quantize_4bit hydra-moe-webui-1 | absmax = torch.zeros((blocks,), device=A.device) hydra-moe-webui-1 | RuntimeError: CUDA error: no kernel image is available for execution on the device hydra-moe-webui-1 | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. hydra-moe-webui-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1. hydra-moe-webui-1 | Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.