Not meant to run locally

Hi,

I think you have not made it clear at all that this code as-is cannot be run locally, and that it relies on HF remote inference. I only realized that when I finally got it to run and it asked me to enter a HF API key.

I'm not sure what your goal is with this project, but if the intent is to have it in the hands of the enthusiasts and get more contributors for it, then we need something more in line with what the global community already uses (4bit models, local gpu/cpu inference, etc).

Right now it's out of the reach for most people to even evaluate the capabilities.

Thanks.

Hi! Your trouble is from huggingface diffusers, please refer to here.

Thanks your advices! We will continue to update our project to make it better.

It can't be that because I have the weights already downloaded?

Granted I am not logged into HF nor do I want to be using their API or inference services. Trying to have a purely local environment.

Hi!

By refering to diffusers's docs and some discussions, there is a workaround.

You can replace all diffusers pipeline ( runwayml/stable-diffusion-v1-5, timbrooks/instruct-pix2pix and runwayml/stable-diffusion-inpainting) in gpt4tools.py with your local paths.

Hi,

Alright, so I made a dirty dirty hack to load a vicuna-13B-1.1-GPTQ-4bit-128g model as safetensors.

Using the following command to load it:

python gpt4tools.py \
    --base_model "models/thebloke_vicuna-13b-1.1-gptq-4bit-128g" \
    --lora_model "loras" \
    --llm_device "cuda" \
    --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"

Maybe I'm not loading the lora properly, I have a loras/ folder and that's where I'm pointing it to. Inside it are the following files:

adapter_config.json
adapter_model.bin

Also, looking at my Task Manager it seems as if there's some network traffic happening despite all of my models being local. I'm not sure why is that but I will look into it later.

Right now I'm wondering is there some way to increase verbosity during inference, or in general?

I'm using a sample image of some tulips and after loading them I send an instruction to replace pink tulips with red roses. But it just sits there for a long time...then some more observations happen, etc. The whole process is very very slow...

Running this on a RTX 3090 24GB VRAM, all under CUDA (no cpu).

Example output from the console:

Processed ImageCaptioning, Input Image: image/9295ff30.png, Output Text: a bouquet of tuliplips

Processed run_image, Input image: image/9295ff30.png
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.')]
Current Memory:
Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.

> Entering new AgentExecutor chain...

Thought: Do I need to use a tool? Yes
Action: Segment the Image
Action Input: image/9295ff30.png
Observation: image/b945daae.png
Thought:
: Do I need to use a tool? No
AI: I understand the image and the request to replace pink tulips with red roses.

> Finished chain.

Processed run_text, Input text:  image/9295ff30.png replace pink tulips with red roses
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.'), (' image/9295ff30.png replace pink tulips with red roses', 'I understand the image and the request to replace pink tulips with red roses.')]
Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.
Human: image/9295ff30.png replace pink tulips with red roses
AI: I understand the image and the request to replace pink tulips with red roses.

Perhaps I need to do more adjustments someplace else?

Then I instruct it to "isolate pink tulips" and I get this:

> Entering new AgentExecutor chain...

Thought: Do I need to use a tool? Yes
Action: Detect the Give Object
Action Input: image/9295ff30.pngTraceback (most recent call last):
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1025, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 1141, in run_text
    res = self.agent({"input": text.strip()})
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 168, in __call__
    raise e
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 165, in __call__
    outputs = self._call(inputs)
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 503, in _call
    next_step_output = self._take_next_step(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 420, in _take_next_step
    observation = tool.run(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 71, in run
    raise e
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 68, in run
    observation = self._run(tool_input)
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/tools.py", line 17, in _run
    return self.func(tool_input)
  File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 964, in inference
    image_path, det_prompt = inputs.split(",")
ValueError: not enough values to unpack (expected 2, got 1)

Any thoughts about the above?

Thanks!

Hi,

Alright, so I made a dirty dirty hack to load a vicuna-13B-1.1-GPTQ-4bit-128g model as safetensors.

Using the following command to load it:

python gpt4tools.py \
    --base_model "models/thebloke_vicuna-13b-1.1-gptq-4bit-128g" \
    --lora_model "loras" \
    --llm_device "cuda" \
    --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"

Maybe I'm not loading the lora properly, I have a loras/ folder and that's where I'm pointing it to. Inside it are the following files:

adapter_config.json
adapter_model.bin

Also, looking at my Task Manager it seems as if there's some network traffic happening despite all of my models being local. I'm not sure why is that but I will look into it later.

Right now I'm wondering is there some way to increase verbosity during inference, or in general?

Running this on a RTX 3090 24GB VRAM, all under CUDA (no cpu).

Example output from the console:

Processed ImageCaptioning, Input Image: image/9295ff30.png, Output Text: a bouquet of tuliplips

Processed run_image, Input image: image/9295ff30.png
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.')]
Current Memory:
Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.

> Entering new AgentExecutor chain...

Thought: Do I need to use a tool? Yes
Action: Segment the Image
Action Input: image/9295ff30.png
Observation: image/b945daae.png
Thought:
: Do I need to use a tool? No
AI: I understand the image and the request to replace pink tulips with red roses.

> Finished chain.

Processed run_text, Input text:  image/9295ff30.png replace pink tulips with red roses
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.'), (' image/9295ff30.png replace pink tulips with red roses', 'I understand the image and the request to replace pink tulips with red roses.')]
Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.
Human: image/9295ff30.png replace pink tulips with red roses
AI: I understand the image and the request to replace pink tulips with red roses.

Perhaps I need to do more adjustments someplace else?

Then I instruct it to "isolate pink tulips" and I get this:

> Entering new AgentExecutor chain...

Thought: Do I need to use a tool? Yes
Action: Detect the Give Object
Action Input: image/9295ff30.pngTraceback (most recent call last):
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1025, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 1141, in run_text
    res = self.agent({"input": text.strip()})
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 168, in __call__
    raise e
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 165, in __call__
    outputs = self._call(inputs)
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 503, in _call
    next_step_output = self._take_next_step(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 420, in _take_next_step
    observation = tool.run(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 71, in run
    raise e
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 68, in run
    observation = self._run(tool_input)
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/tools.py", line 17, in _run
    return self.func(tool_input)
  File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 964, in inference
    image_path, det_prompt = inputs.split(",")
ValueError: not enough values to unpack (expected 2, got 1)

Any thoughts about the above?

Thanks!

Could you please let me know how to load the safetensor Vicuna? I tried the code but it returned error as No such file or directory of pytorch_model-00001-of-00003.bin, apparently it is still searching for the bin files.

Could you please let me know how to load the safetensor Vicuna? I tried the code but it returned error as No such file or directory of pytorch_model-00001-of-00003.bin, apparently it is still searching for the bin files.

This is a very dirty hack and probably not the way to do it. I took bits and pieces from the GPTQ_loader.py module in oogabooga/text-generation-webui

In llama.py replace __init__ from class LlamaHuggingFace with:

    def __init__(self,
                 base_model,
                 lora_model,
                 task='text-generation',
                 device='cpu',
                 max_new_tokens=512,
                 temperature=0.1,
                 top_p=0.75,
                 top_k=40,
                 num_beams=1):
        self.task = task
        self.device = device
        self.temperature = temperature
        self.max_new_tokens = max_new_tokens
        self.top_p = top_p
        self.top_k = top_k
        self.num_beams = num_beams

        t0 = time.time()
        print("Looking for model...")
        path_to_model = Path(f'{base_model}')
        checkpoint = str(list(path_to_model.glob("*.safetensors"))[0])
        print("Found: %s"  % checkpoint)

        print("Configuring model...")
        exclude_layers=['lm_head']
        config = AutoConfig.from_pretrained(str(path_to_model))
        def noop(*args, **kwargs):
            pass
        torch.nn.init.kaiming_uniform_ = noop
        torch.nn.init.uniform_ = noop
        torch.nn.init.normal_ = noop

        torch.set_default_dtype(torch.half)
        transformers.modeling_utils._init_weights = False
        torch.set_default_dtype(torch.half)
        self.model = AutoModelForCausalLM.from_config(config)
        torch.set_default_dtype(torch.float)
        self.model = self.model.eval()
        layers = find_layers(self.model)
        for name in exclude_layers:
            if name in layers:
                del layers[name]
        make_quant_linear(self.model, layers, 4, 128)
        del layers

        print("Loading model...")
        if checkpoint.endswith('.safetensors'):
            from safetensors.torch import load_file as safe_load
            self.model.load_state_dict(safe_load(checkpoint), strict=False)
        else:
            self.model.load_state_dict(torch.load(checkpoint))

        self.model.seqlen = 2048

        print("Loading LoRA...")
        path_to_lora = Path(f'{lora_model}')
        params = {}
        if self.device != 'cpu':
            params['dtype'] = self.model.dtype
            params['max_memory'] = self.max_memory
            if hasattr(self.model, "hf_device_map"):
                params['device_map'] = {"base_model.model." + k: v for k, v in self.model.hf_device_map.items()}

        print("LoRA params: {}" . format(params))

        self.model = PeftModel.from_pretrained(self.model, path_to_lora, **params)

        if self.device != 'cpu':
            self.model.half()
            if not hasattr(self.model, "hf_device_map"):
                if torch.has_mps:
                    device = torch.device('mps')
                    self.model = self.model.to(device)
                else:
                    self.model = self.model.cuda()
        else:
            self.model.float()

        print("Done with LoRA")

        print("Done.")

        print("Load the tokenizer...")
        self.tokenizer = LlamaTokenizer.from_pretrained(path_to_model, clean_up_tokenization_spaces=True)
        print("Done.")

        self.tokenizer.pad_token_id = 0
        self.model.config.pad_token_id = 0
        self.model.config.bos_token_id = 1
        self.model.config.eos_token_id = 2

        print(f"Loaded the model in {(time.time()-t0):.2f} seconds.")

Hi,

Alright, so I made a dirty dirty hack to load a vicuna-13B-1.1-GPTQ-4bit-128g model as safetensors.

Using the following command to load it:

python gpt4tools.py \
    --base_model "models/thebloke_vicuna-13b-1.1-gptq-4bit-128g" \
    --lora_model "loras" \
    --llm_device "cuda" \
    --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"

Maybe I'm not loading the lora properly, I have a loras/ folder and that's where I'm pointing it to. Inside it are the following files:

adapter_config.json
adapter_model.bin

Also, looking at my Task Manager it seems as if there's some network traffic happening despite all of my models being local. I'm not sure why is that but I will look into it later.

Right now I'm wondering is there some way to increase verbosity during inference, or in general?

Running this on a RTX 3090 24GB VRAM, all under CUDA (no cpu).

Example output from the console:

Processed ImageCaptioning, Input Image: image/9295ff30.png, Output Text: a bouquet of tuliplips

Processed run_image, Input image: image/9295ff30.png
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.')]
Current Memory:
Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.

> Entering new AgentExecutor chain...

Thought: Do I need to use a tool? Yes
Action: Segment the Image
Action Input: image/9295ff30.png
Observation: image/b945daae.png
Thought:
: Do I need to use a tool? No
AI: I understand the image and the request to replace pink tulips with red roses.

> Finished chain.

Processed run_text, Input text:  image/9295ff30.png replace pink tulips with red roses
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.'), (' image/9295ff30.png replace pink tulips with red roses', 'I understand the image and the request to replace pink tulips with red roses.')]
Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.
Human: image/9295ff30.png replace pink tulips with red roses
AI: I understand the image and the request to replace pink tulips with red roses.

Perhaps I need to do more adjustments someplace else?

Then I instruct it to "isolate pink tulips" and I get this:

> Entering new AgentExecutor chain...

Thought: Do I need to use a tool? Yes
Action: Detect the Give Object
Action Input: image/9295ff30.pngTraceback (most recent call last):
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1025, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 1141, in run_text
    res = self.agent({"input": text.strip()})
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 168, in __call__
    raise e
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 165, in __call__
    outputs = self._call(inputs)
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 503, in _call
    next_step_output = self._take_next_step(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 420, in _take_next_step
    observation = tool.run(
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 71, in run
    raise e
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 68, in run
    observation = self._run(tool_input)
  File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/tools.py", line 17, in _run
    return self.func(tool_input)
  File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 964, in inference
    image_path, det_prompt = inputs.split(",")
ValueError: not enough values to unpack (expected 2, got 1)

Any thoughts about the above?

Thanks!

Your prompt should have format of "image_path, your question". We will deal with this bug.

Your prompt should have format of "image_path, your question". We will deal with this bug.

Yes, please, the image_path eventually disappears and the system becomes confused.

I'm still not having much luck with loading the LoRA, any ideas?

Could you please let me know how to load the safetensor Vicuna? I tried the code but it returned error as No such file or directory of pytorch_model-00001-of-00003.bin, apparently it is still searching for the bin files.

This is a very dirty hack and probably not the way to do it. I took bits and pieces from the GPTQ_loader.py module in oogabooga/text-generation-webui

In llama.py replace init from class LlamaHuggingFace with:

    def __init__(self,
                 base_model,
                 lora_model,
                 task='text-generation',
                 device='cpu',
                 max_new_tokens=512,
                 temperature=0.1,
                 top_p=0.75,
                 top_k=40,
                 num_beams=1):
        self.task = task
        self.device = device
        self.temperature = temperature
        self.max_new_tokens = max_new_tokens
        self.top_p = top_p
        self.top_k = top_k
        self.num_beams = num_beams

        t0 = time.time()
        print("Looking for model...")
        path_to_model = Path(f'{base_model}')
        checkpoint = str(list(path_to_model.glob("*.safetensors"))[0])
        print("Found: %s"  % checkpoint)

        print("Configuring model...")
        exclude_layers=['lm_head']
        config = AutoConfig.from_pretrained(str(path_to_model))
        def noop(*args, **kwargs):
            pass
        torch.nn.init.kaiming_uniform_ = noop
        torch.nn.init.uniform_ = noop
        torch.nn.init.normal_ = noop

        torch.set_default_dtype(torch.half)
        transformers.modeling_utils._init_weights = False
        torch.set_default_dtype(torch.half)
        self.model = AutoModelForCausalLM.from_config(config)
        torch.set_default_dtype(torch.float)
        self.model = self.model.eval()
        layers = find_layers(self.model)
        for name in exclude_layers:
            if name in layers:
                del layers[name]
        make_quant_linear(self.model, layers, 4, 128)
        del layers

        print("Loading model...")
        if checkpoint.endswith('.safetensors'):
            from safetensors.torch import load_file as safe_load
            self.model.load_state_dict(safe_load(checkpoint), strict=False)
        else:
            self.model.load_state_dict(torch.load(checkpoint))

        self.model.seqlen = 2048

        print("Loading LoRA...")
        path_to_lora = Path(f'{lora_model}')
        params = {}
        if self.device != 'cpu':
            params['dtype'] = self.model.dtype
            params['max_memory'] = self.max_memory
            if hasattr(self.model, "hf_device_map"):
                params['device_map'] = {"base_model.model." + k: v for k, v in self.model.hf_device_map.items()}

        print("LoRA params: {}" . format(params))

        self.model = PeftModel.from_pretrained(self.model, path_to_lora, **params)

        if self.device != 'cpu':
            self.model.half()
            if not hasattr(self.model, "hf_device_map"):
                if torch.has_mps:
                    device = torch.device('mps')
                    self.model = self.model.to(device)
                else:
                    self.model = self.model.cuda()
        else:
            self.model.float()

        print("Done with LoRA")

        print("Done.")

        print("Load the tokenizer...")
        self.tokenizer = LlamaTokenizer.from_pretrained(path_to_model, clean_up_tokenization_spaces=True)
        print("Done.")

        self.tokenizer.pad_token_id = 0
        self.model.config.pad_token_id = 0
        self.model.config.bos_token_id = 1
        self.model.config.eos_token_id = 2

        print(f"Loaded the model in {(time.time()-t0):.2f} seconds.")

Thank you for the reply! I followed the instructions and imported the module needed, but I encountered this error, it looks like the llama validation is not passing, have you seen the same error?

I have forgotten to include a few more bits...

You need this before DEFAULT_REPO_ID at the top of llama.py:

sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa.triton")))
from quant import make_quant_linear

Then you need to make a folder repositories and clone https://github.com/oobabooga/GPTQ-for-LLaMa.git into it. Make sure you compile and install it (python setup_cuda.py install).

AILab-CVC / GPT4Tools

Not meant to run locally #6