Open 3dluvr opened 1 year ago
Hi! Your trouble is from huggingface diffusers, please refer to here.
Thanks your advices! We will continue to update our project to make it better.
Hi
It can't be that because I have the weights already downloaded?
Granted I am not logged into HF nor do I want to be using their API or inference services. Trying to have a purely local environment.
Hi!
By refering to diffusers's docs and some discussions, there is a workaround.
You can replace all diffusers pipeline ( runwayml/stable-diffusion-v1-5
, timbrooks/instruct-pix2pix
and runwayml/stable-diffusion-inpainting
) in gpt4tools.py
with your local paths.
Hi,
Alright, so I made a dirty dirty hack to load a vicuna-13B-1.1-GPTQ-4bit-128g model as safetensors.
Using the following command to load it:
python gpt4tools.py \
--base_model "models/thebloke_vicuna-13b-1.1-gptq-4bit-128g" \
--lora_model "loras" \
--llm_device "cuda" \
--load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
Maybe I'm not loading the lora properly, I have a loras/ folder and that's where I'm pointing it to. Inside it are the following files:
adapter_config.json
adapter_model.bin
Also, looking at my Task Manager it seems as if there's some network traffic happening despite all of my models being local. I'm not sure why is that but I will look into it later.
Right now I'm wondering is there some way to increase verbosity during inference, or in general?
I'm using a sample image of some tulips and after loading them I send an instruction to replace pink tulips with red roses. But it just sits there for a long time...then some more observations happen, etc. The whole process is very very slow...
Running this on a RTX 3090 24GB VRAM, all under CUDA (no cpu).
Example output from the console:
Processed ImageCaptioning, Input Image: image/9295ff30.png, Output Text: a bouquet of tuliplips
Processed run_image, Input image: image/9295ff30.png
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.')]
Current Memory:
Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.
> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: Segment the Image
Action Input: image/9295ff30.png
Observation: image/b945daae.png
Thought:
: Do I need to use a tool? No
AI: I understand the image and the request to replace pink tulips with red roses.
> Finished chain.
Processed run_text, Input text: image/9295ff30.png replace pink tulips with red roses
Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.'), (' image/9295ff30.png replace pink tulips with red roses', 'I understand the image and the request to replace pink tulips with red roses.')]
Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools.
AI: Received.
Human: image/9295ff30.png replace pink tulips with red roses
AI: I understand the image and the request to replace pink tulips with red roses.
Perhaps I need to do more adjustments someplace else?
Then I instruct it to "isolate pink tulips" and I get this:
> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: Detect the Give Object
Action Input: image/9295ff30.pngTraceback (most recent call last):
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 1141, in run_text
res = self.agent({"input": text.strip()})
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 168, in __call__
raise e
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 165, in __call__
outputs = self._call(inputs)
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 503, in _call
next_step_output = self._take_next_step(
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 420, in _take_next_step
observation = tool.run(
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 71, in run
raise e
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 68, in run
observation = self._run(tool_input)
File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/tools.py", line 17, in _run
return self.func(tool_input)
File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 964, in inference
image_path, det_prompt = inputs.split(",")
ValueError: not enough values to unpack (expected 2, got 1)
Any thoughts about the above?
Thanks!
Hi,
Alright, so I made a dirty dirty hack to load a vicuna-13B-1.1-GPTQ-4bit-128g model as safetensors.
Using the following command to load it:
python gpt4tools.py \ --base_model "models/thebloke_vicuna-13b-1.1-gptq-4bit-128g" \ --lora_model "loras" \ --llm_device "cuda" \ --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
Maybe I'm not loading the lora properly, I have a loras/ folder and that's where I'm pointing it to. Inside it are the following files:
adapter_config.json adapter_model.bin
Also, looking at my Task Manager it seems as if there's some network traffic happening despite all of my models being local. I'm not sure why is that but I will look into it later.
Right now I'm wondering is there some way to increase verbosity during inference, or in general?
I'm using a sample image of some tulips and after loading them I send an instruction to replace pink tulips with red roses. But it just sits there for a long time...then some more observations happen, etc. The whole process is very very slow...
Running this on a RTX 3090 24GB VRAM, all under CUDA (no cpu).
Example output from the console:
Processed ImageCaptioning, Input Image: image/9295ff30.png, Output Text: a bouquet of tuliplips Processed run_image, Input image: image/9295ff30.png Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.')] Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools. AI: Received. > Entering new AgentExecutor chain... Thought: Do I need to use a tool? Yes Action: Segment the Image Action Input: image/9295ff30.png Observation: image/b945daae.png Thought: : Do I need to use a tool? No AI: I understand the image and the request to replace pink tulips with red roses. > Finished chain. Processed run_text, Input text: image/9295ff30.png replace pink tulips with red roses Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.'), (' image/9295ff30.png replace pink tulips with red roses', 'I understand the image and the request to replace pink tulips with red roses.')] Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools. AI: Received. Human: image/9295ff30.png replace pink tulips with red roses AI: I understand the image and the request to replace pink tulips with red roses.
Perhaps I need to do more adjustments someplace else?
Then I instruct it to "isolate pink tulips" and I get this:
> Entering new AgentExecutor chain... Thought: Do I need to use a tool? Yes Action: Detect the Give Object Action Input: image/9295ff30.pngTraceback (most recent call last): File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 1141, in run_text res = self.agent({"input": text.strip()}) File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 168, in __call__ raise e File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 165, in __call__ outputs = self._call(inputs) File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 503, in _call next_step_output = self._take_next_step( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 420, in _take_next_step observation = tool.run( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 71, in run raise e File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 68, in run observation = self._run(tool_input) File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/tools.py", line 17, in _run return self.func(tool_input) File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 964, in inference image_path, det_prompt = inputs.split(",") ValueError: not enough values to unpack (expected 2, got 1)
Any thoughts about the above?
Thanks!
Could you please let me know how to load the safetensor Vicuna? I tried the code but it returned error as No such file or directory of pytorch_model-00001-of-00003.bin, apparently it is still searching for the bin files.
Could you please let me know how to load the safetensor Vicuna? I tried the code but it returned error as No such file or directory of pytorch_model-00001-of-00003.bin, apparently it is still searching for the bin files.
This is a very dirty hack and probably not the way to do it. I took bits and pieces from the GPTQ_loader.py module in oogabooga/text-generation-webui
In llama.py replace __init__ from class LlamaHuggingFace with:
def __init__(self,
base_model,
lora_model,
task='text-generation',
device='cpu',
max_new_tokens=512,
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=1):
self.task = task
self.device = device
self.temperature = temperature
self.max_new_tokens = max_new_tokens
self.top_p = top_p
self.top_k = top_k
self.num_beams = num_beams
t0 = time.time()
print("Looking for model...")
path_to_model = Path(f'{base_model}')
checkpoint = str(list(path_to_model.glob("*.safetensors"))[0])
print("Found: %s" % checkpoint)
print("Configuring model...")
exclude_layers=['lm_head']
config = AutoConfig.from_pretrained(str(path_to_model))
def noop(*args, **kwargs):
pass
torch.nn.init.kaiming_uniform_ = noop
torch.nn.init.uniform_ = noop
torch.nn.init.normal_ = noop
torch.set_default_dtype(torch.half)
transformers.modeling_utils._init_weights = False
torch.set_default_dtype(torch.half)
self.model = AutoModelForCausalLM.from_config(config)
torch.set_default_dtype(torch.float)
self.model = self.model.eval()
layers = find_layers(self.model)
for name in exclude_layers:
if name in layers:
del layers[name]
make_quant_linear(self.model, layers, 4, 128)
del layers
print("Loading model...")
if checkpoint.endswith('.safetensors'):
from safetensors.torch import load_file as safe_load
self.model.load_state_dict(safe_load(checkpoint), strict=False)
else:
self.model.load_state_dict(torch.load(checkpoint))
self.model.seqlen = 2048
print("Loading LoRA...")
path_to_lora = Path(f'{lora_model}')
params = {}
if self.device != 'cpu':
params['dtype'] = self.model.dtype
params['max_memory'] = self.max_memory
if hasattr(self.model, "hf_device_map"):
params['device_map'] = {"base_model.model." + k: v for k, v in self.model.hf_device_map.items()}
print("LoRA params: {}" . format(params))
self.model = PeftModel.from_pretrained(self.model, path_to_lora, **params)
if self.device != 'cpu':
self.model.half()
if not hasattr(self.model, "hf_device_map"):
if torch.has_mps:
device = torch.device('mps')
self.model = self.model.to(device)
else:
self.model = self.model.cuda()
else:
self.model.float()
print("Done with LoRA")
print("Done.")
print("Load the tokenizer...")
self.tokenizer = LlamaTokenizer.from_pretrained(path_to_model, clean_up_tokenization_spaces=True)
print("Done.")
self.tokenizer.pad_token_id = 0
self.model.config.pad_token_id = 0
self.model.config.bos_token_id = 1
self.model.config.eos_token_id = 2
print(f"Loaded the model in {(time.time()-t0):.2f} seconds.")
Hi,
Alright, so I made a dirty dirty hack to load a vicuna-13B-1.1-GPTQ-4bit-128g model as safetensors.
Using the following command to load it:
python gpt4tools.py \ --base_model "models/thebloke_vicuna-13b-1.1-gptq-4bit-128g" \ --lora_model "loras" \ --llm_device "cuda" \ --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
Maybe I'm not loading the lora properly, I have a loras/ folder and that's where I'm pointing it to. Inside it are the following files:
adapter_config.json adapter_model.bin
Also, looking at my Task Manager it seems as if there's some network traffic happening despite all of my models being local. I'm not sure why is that but I will look into it later.
Right now I'm wondering is there some way to increase verbosity during inference, or in general?
I'm using a sample image of some tulips and after loading them I send an instruction to replace pink tulips with red roses. But it just sits there for a long time...then some more observations happen, etc. The whole process is very very slow...
Running this on a RTX 3090 24GB VRAM, all under CUDA (no cpu).
Example output from the console:
Processed ImageCaptioning, Input Image: image/9295ff30.png, Output Text: a bouquet of tuliplips Processed run_image, Input image: image/9295ff30.png Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.')] Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools. AI: Received. > Entering new AgentExecutor chain... Thought: Do I need to use a tool? Yes Action: Segment the Image Action Input: image/9295ff30.png Observation: image/b945daae.png Thought: : Do I need to use a tool? No AI: I understand the image and the request to replace pink tulips with red roses. > Finished chain. Processed run_text, Input text: image/9295ff30.png replace pink tulips with red roses Current state: [('![](file=image/9295ff30.png)*image/9295ff30.png*', 'Received.'), (' image/9295ff30.png replace pink tulips with red roses', 'I understand the image and the request to replace pink tulips with red roses.')] Current Memory: Human: Provide an image named image/9295ff30.png. The description is: a bouquet of tuliplips. Understand the image using tools. AI: Received. Human: image/9295ff30.png replace pink tulips with red roses AI: I understand the image and the request to replace pink tulips with red roses.
Perhaps I need to do more adjustments someplace else?
Then I instruct it to "isolate pink tulips" and I get this:
> Entering new AgentExecutor chain... Thought: Do I need to use a tool? Yes Action: Detect the Give Object Action Input: image/9295ff30.pngTraceback (most recent call last): File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1302, in process_api result = await self.call_function( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/gradio/blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 1141, in run_text res = self.agent({"input": text.strip()}) File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 168, in __call__ raise e File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/chains/base.py", line 165, in __call__ outputs = self._call(inputs) File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 503, in _call next_step_output = self._take_next_step( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/agent.py", line 420, in _take_next_step observation = tool.run( File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 71, in run raise e File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/tools/base.py", line 68, in run observation = self._run(tool_input) File "/home/user/Envs/gpt4tools_env/lib/python3.10/site-packages/langchain/agents/tools.py", line 17, in _run return self.func(tool_input) File "/home/user/Envs/gpt4tools_env/GPT4Tools/gpt4tools.py", line 964, in inference image_path, det_prompt = inputs.split(",") ValueError: not enough values to unpack (expected 2, got 1)
Any thoughts about the above?
Thanks!
Your prompt should have format of "image_path, your question". We will deal with this bug.
Your prompt should have format of "image_path, your question". We will deal with this bug.
Yes, please, the image_path eventually disappears and the system becomes confused.
I'm still not having much luck with loading the LoRA, any ideas?
Could you please let me know how to load the safetensor Vicuna? I tried the code but it returned error as No such file or directory of pytorch_model-00001-of-00003.bin, apparently it is still searching for the bin files.
This is a very dirty hack and probably not the way to do it. I took bits and pieces from the GPTQ_loader.py module in oogabooga/text-generation-webui
In llama.py replace init from class LlamaHuggingFace with:
def __init__(self, base_model, lora_model, task='text-generation', device='cpu', max_new_tokens=512, temperature=0.1, top_p=0.75, top_k=40, num_beams=1): self.task = task self.device = device self.temperature = temperature self.max_new_tokens = max_new_tokens self.top_p = top_p self.top_k = top_k self.num_beams = num_beams t0 = time.time() print("Looking for model...") path_to_model = Path(f'{base_model}') checkpoint = str(list(path_to_model.glob("*.safetensors"))[0]) print("Found: %s" % checkpoint) print("Configuring model...") exclude_layers=['lm_head'] config = AutoConfig.from_pretrained(str(path_to_model)) def noop(*args, **kwargs): pass torch.nn.init.kaiming_uniform_ = noop torch.nn.init.uniform_ = noop torch.nn.init.normal_ = noop torch.set_default_dtype(torch.half) transformers.modeling_utils._init_weights = False torch.set_default_dtype(torch.half) self.model = AutoModelForCausalLM.from_config(config) torch.set_default_dtype(torch.float) self.model = self.model.eval() layers = find_layers(self.model) for name in exclude_layers: if name in layers: del layers[name] make_quant_linear(self.model, layers, 4, 128) del layers print("Loading model...") if checkpoint.endswith('.safetensors'): from safetensors.torch import load_file as safe_load self.model.load_state_dict(safe_load(checkpoint), strict=False) else: self.model.load_state_dict(torch.load(checkpoint)) self.model.seqlen = 2048 print("Loading LoRA...") path_to_lora = Path(f'{lora_model}') params = {} if self.device != 'cpu': params['dtype'] = self.model.dtype params['max_memory'] = self.max_memory if hasattr(self.model, "hf_device_map"): params['device_map'] = {"base_model.model." + k: v for k, v in self.model.hf_device_map.items()} print("LoRA params: {}" . format(params)) self.model = PeftModel.from_pretrained(self.model, path_to_lora, **params) if self.device != 'cpu': self.model.half() if not hasattr(self.model, "hf_device_map"): if torch.has_mps: device = torch.device('mps') self.model = self.model.to(device) else: self.model = self.model.cuda() else: self.model.float() print("Done with LoRA") print("Done.") print("Load the tokenizer...") self.tokenizer = LlamaTokenizer.from_pretrained(path_to_model, clean_up_tokenization_spaces=True) print("Done.") self.tokenizer.pad_token_id = 0 self.model.config.pad_token_id = 0 self.model.config.bos_token_id = 1 self.model.config.eos_token_id = 2 print(f"Loaded the model in {(time.time()-t0):.2f} seconds.")
Thank you for the reply! I followed the instructions and imported the module needed, but I encountered this error, it looks like the llama validation is not passing, have you seen the same error?
I have forgotten to include a few more bits...
You need this before DEFAULT_REPO_ID at the top of llama.py:
sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa.triton")))
from quant import make_quant_linear
Then you need to make a folder repositories and clone https://github.com/oobabooga/GPTQ-for-LLaMa.git into it. Make sure you compile and install it (python setup_cuda.py install).
Hi,
I think you have not made it clear at all that this code as-is cannot be run locally, and that it relies on HF remote inference. I only realized that when I finally got it to run and it asked me to enter a HF API key.
I'm not sure what your goal is with this project, but if the intent is to have it in the hands of the enthusiasts and get more contributors for it, then we need something more in line with what the global community already uses (4bit models, local gpu/cpu inference, etc).
Right now it's out of the reach for most people to even evaluate the capabilities.
Thanks.