Returned exception: 'LoLLMsWebUI' object has no attribute 'socketio' AND torch.cuda.OutOfMemoryError

HorrorBest commented 7 months ago

Expected Behavior

Model being loaded

Current Behavior

Model Not loading

Steps to Reproduce

I don't know actually, I installed LOLLMs with the win installer bat file, and chose the option with CUDA.I Downloaded the HuggingFace Zoo Binding. and I tired to manually place a model and that didn't work (no need to worry about that now) So I downloaded one "https://huggingface.co/TheBloke/neural-chat-7B-v3-2-GPTQ" using the UI and I it didn't load and I got the following error in the terminal

Failed to load model. Returned exception: 'LoLLMsWebUI' object has no attribute 'socketio' Traceback (most recent call last): File "d:\ai\llm\lollms-webui\lollms-webui\lollms_core\lollms\app.py", line 259, in load_model model = ModelBuilder(self.binding).get_model() File "d:\ai\llm\lollms-webui\lollms-webui\lollms_core\lollms\binding.py", line 547, in init self.build_model() File "d:\ai\llm\lollms-webui\lollms-webui\lollms_core\lollms\binding.py", line 550, in build_model self.model = self.binding.build_model() File "D:\Ai\LLM\lollms-webui\lollms-webui\zoos\bindings_zoo\hugging_face__init__.py", line 228, in build_model self.model = AutoModelForCausalLM.from_pretrained(str(model_path), File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\transformers\models\auto\auto_factory.py", line 566, in from_pretrained return model_class.from_pretrained( File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\transformers\modeling_utils.py", line 3539, in from_pretrained dispatch_model(model, **device_map_kwargs) File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\big_modeling.py", line 396, in dispatch_model attach_align_device_hook_on_blocks( File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\hooks.py", line 547, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\hooks.py", line 547, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\hooks.py", line 547, in attach_align_device_hook_on_blocks attach_align_device_hook_on_blocks( File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\hooks.py", line 517, in attach_align_device_hook_on_blocks add_hook_to_module(module, hook) File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\hooks.py", line 156, in add_hook_to_module module = hook.init_hook(module) File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\hooks.py", line 254, in init_hook set_module_tensor_to_device(module, name, self.execution_device) File "D:\Ai\LLM\lollms-webui\installer_files\lollms_env\lib\site-packages\accelerate\utils\modeling.py", line 311, in set_module_tensor_to_device new_value = old_value.to(device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 0 bytes is free. Of the allocated memory 3.54 GiB is allocated by PyTorch, and 4.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "d:\ai\llm\lollms-webui\lollms-webui\lollms_core\lollms\app.py", line 102, in init self.model = self.load_model() File "d:\ai\llm\lollms-webui\lollms-webui\lollms_core\lollms\app.py", line 264, in load_model self.notify("Couldn't load model.", False) File "D:\Ai\LLM\lollms-webui\lollms-webui\api__init__.py", line 1489, in notify self.socketio.emit('notification', { AttributeError: 'LoLLMsWebUI' object has no attribute 'socketio'

Possible Solution

Reduce memory usage from the UI or config file ?? Im new so I don't know where everything should be

Context

I have 2 GPUs Nvidia Quadro M2000M with 4gb Vram and a Inter GPU and a Intel i7 6th gen I think

JustPlaneTastic commented 7 months ago

What does the report of nvidia-smi give you regarding VRAM usage?

oops, you said windows :)

It seems to me that the model you've chosen may need more VRAM than your system can provide. On my linux system the webui is using very little VRAM (My current total VRAM usage is sitting at 133MB for all applications with webui running but no work being done).

Are you linking the two GPUs together? Which specific version of the model you mentioned about are you using?

I'm not the developer, just someone poking around myself.

ParisNeo commented 7 months ago

Hi there and thanks for testing. If you want dynamic layers sharing between ram and vram then use ctransformers binding and set how many layers to put on gpu. Fusing two gpus requires triton and i think it is not supported on windows. (I'm not sure)

ParisNeo / lollms-webui