AssertionError: Can't find models/None

ali0une commented 10 months ago

Hi there!

i'd like to have this extension running on my Text generation web UI but have this error and being new to this local LLaMA AND having limited understanding of python i have this error i can't solve at the moment. Tried to use the script .py of this repository instead of the one in oobabooga repo but still no luck.

With the script.py of this repository i get : On the Automatic1111 side : Total progress: 100%|█████████████████████████████████████████████████████████████████████████████| 32/32 [00:07<00:00, 4.46it/s]

So the image is created.

But then on the text-generation-webui side :

19:17:09-971808 INFO     Loading the extension "gallery"                                                                          
19:17:09-974700 INFO     Loading the extension "send_pictures"                                                                    
19:17:09-976998 INFO     Loading the extension "sd_api_pictures"                                                                  
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
19:17:21-754842 INFO     Loading TheBloke_Mixtral_7Bx2_MoE-GPTQ_gptq-4bit-32g-actorder_True                                       
19:17:24-143077 INFO     LOADER: ExLlamav2_HF                                                                                     
19:17:24-143625 INFO     TRUNCATION LENGTH: 16384                                                                                 
19:17:24-144021 INFO     INSTRUCTION TEMPLATE: Alpaca                                                                             
19:17:24-144359 INFO     Loaded the model in 2.39 seconds.                                                                        
Output generated in 5.19 seconds (22.74 tokens/s, 118 tokens, context 644, seed 1107870413)
Requesting Auto1111 to re-load last checkpoint used...
Prompting the image generator via the API on http://127.0.0.1:7861...
Requesting Auto1111 to vacate VRAM...
19:17:53-351211 INFO     Loading None                                                                                             
Traceback (most recent call last):
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1199, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/utils.py", line 519, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/utils.py", line 512, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/gradio/utils.py", line 649, in gen_wrapper
    yield from f(*args, **kwargs)
  File "/whatever/text-generation-webui/modules/chat.py", line 364, in generate_chat_reply_wrapper
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
  File "/whatever/text-generation-webui/modules/chat.py", line 332, in generate_chat_reply
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
  File "/whatever/text-generation-webui/modules/chat.py", line 300, in chatbot_wrapper
    output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/modules/extensions.py", line 229, in apply_extensions
    return EXTENSION_MAP[typ](*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/modules/extensions.py", line 87, in _apply_string_extensions
    text = func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/extensions/sd_api_pictures/script.py", line 212, in output_modifier
    string = get_SD_pictures(string) + "\n" + text
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/extensions/sd_api_pictures/script.py", line 179, in get_SD_pictures
    give_VRAM_priority('LLM')
  File "/whatever/text-generation-webui/extensions/sd_api_pictures/script.py", line 54, in give_VRAM_priority
    reload_model()
  File "/whatever/text-generation-webui/modules/models.py", line 453, in reload_model
    shared.model, shared.tokenizer = load_model(shared.model_name)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/modules/models.py", line 389, in ExLlamav2_HF_loader
    return Exllamav2HF.from_pretrained(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/whatever/text-generation-webui/modules/exllamav2_hf.py", line 162, in from_pretrained
    config.prepare()
  File "/whatever/text-generation-webui/venv/lib/python3.11/site-packages/exllamav2/config.py", line 69, in prepare
    assert os.path.exists(self.model_dir), "Can't find " + self.model_dir
AssertionError: Can't find models/None

And then the model that was in use is not loaded. Could someone explain why it says "AssertionError: Can't find models/None" please?

Thanks!

Brawlence commented 10 months ago

Hello! This is probably a VRAM management issue, on the part of the script in textgen-webui.

What's your GPU specs? How much VRAM do you use?

I see you use the new Mixtral model, what's its size? Can you try disabling the 'Manage VRAM' checkbox?

ali0une commented 10 months ago

Thanks for the quick answer!

i'm using a NViDIA 3060 RTX with 12G VRAM. The model.safetensors is 7,3G Once charged it's about 11G in VRAM.

if i disable 'Manage VRAM' i just can't load both models due to limited VRAM i have OOM even with a 2G SD-1.5 model. Maybe i should try with a tiny Text generation model like 7B?

Let me know if you need more informations.

ali0une commented 10 months ago

Hi!

Just tested with a smaller model TheBloke_Wizard-Vicuna-7B-Uncensored-GPTQ_gptq-4bit-128g-actorder_True

Automatic1111 doesn't seem to be the culprit as i can load a sd-1.5 model just fine, only takes 2G of 12G VRAM. Then Wizard-Vicuna-7B takes 4G more, i'm at 6G of 12G VRAM.

i can tell Text Generation web UI to "send selfie" and it works at about 9G of VRAM usage with a final peak at 12G (VAE i guess) but only if i uncheck the 'Manage VRAM' checkbox.

So you were right as it is probably a VRAM management issue.

Hope that helps.

Very nice extension btw, hope you could fix this issue later and maybe i could use a 13B or 7Bx2 model with an SDXL model in SD.

Brawlence commented 10 months ago

Yeah, that helps immensely!

I'm trying to narrow the problem down and it seems like the GGUF loader somewhy can't correctly handle the unload-reload cycle (which is unavoidable when Manage VRAM is on). IDK if I can fix it on my end, but if I manage to, I'll let ooba know with a PR

Brawlence / SD_api_pics

AssertionError: Can't find models/None #7