OpenInterpreter / open-interpreter

A natural language interface for computers
http://openinterpreter.com/
GNU Affero General Public License v3.0
55.32k stars 4.82k forks source link

unable to use interpreter --local --vision #1467

Open drhouse opened 1 month ago

drhouse commented 1 month ago

Issue

I'm not sure about the proper workflow to use with interpreter vision after reading this. For the record, I separately installed moondream and it ran great, including its gradio demo. As a sidenote, regarding vision, I am curious about whether Open Interpreter will be able use llama 3.2's vision capability.

Platform

I am running Windows10 x64, modern pc hardware, Windows Terminal > Powershell Do I need to be using WSL?

Attempts

When trying to use 'interpreter --local --vision' with llama 3.2, it doesn't seem to be able to use moondream to view anything. I've tried commands like 'what do you see?' and 'take a screenshot and describe it', it doesn't understand it has moondream available.

I have also tried 'interpreter --local --vision --os' with llama 3.2 and get a bit further, it will:

Error

after which I get this error

Traceback (most recent call last):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Scripts\interpreter.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 610, in main
    start_terminal_interface(interpreter)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 576, in start_terminal_interface
    interpreter.chat()
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 191, in chat
    for _ in self._streaming_chat(message=message, display=display):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 223, in _streaming_chat
    yield from terminal_interface(self, message)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\terminal_interface.py", line 157, in terminal_interface
    for chunk in interpreter.chat(message, display=False, stream=True):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 259, in _streaming_chat
    yield from self._respond_and_store()
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 318, in _respond_and_store
    for chunk in respond(self):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\respond.py", line 86, in respond
    for chunk in interpreter.llm.run(messages_for_llm):
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\llm\llm.py", line 180, in run
    image_description = self.vision_renderer(lmc=img_msg)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\computer\vision\vision.py", line 171, in query
    answer = self.model.answer_question(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 93, in answer_question
    answer = self.generate(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 77, in generate
    output_ids = self.text_model.generate(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2024, in generate
    result = self._sample(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 1104, in forward
    outputs = self.transformer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 959, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 763, in forward
    attn_outputs, self_attn_weights, present_key_value = self.mixer(
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 382, in forward
    query_rot, key_rot = apply_rotary_pos_emb(
  File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 214, in apply_rotary_pos_emb
    cos = cos[position_ids].unsqueeze(unsqueeze_dim)
IndexError: index is out of bounds for dimension with size 0

Screenshot

2024-09-26 23_12_03-Greenshot

Open Interpreter version

0.3.13

Python version

3.10.11

Operating System name and version

Windows 10

Manamama commented 2 weeks ago

A tip - even without vision as an argument the "i" model is clever enough to use tesseract, on its own.