I'm not sure about the proper workflow to use with interpreter vision after reading this. For the record, I separately installed moondream and it ran great, including its gradio demo. As a sidenote, regarding vision, I am curious about whether Open Interpreter will be able use llama 3.2's vision capability.
Platform
I am running Windows10 x64, modern pc hardware, Windows Terminal > Powershell
Do I need to be using WSL?
Attempts
When trying to use 'interpreter --local --vision' with llama 3.2, it doesn't seem to be able to use moondream to view anything. I've tried commands like 'what do you see?' and 'take a screenshot and describe it', it doesn't understand it has moondream available.
I have also tried 'interpreter --local --vision --os' with llama 3.2 and get a bit further, it will:
take a screenshot
save it in 'C:/Windows/Temp'
opens it in FSViewer (my associated photo program)
tries using computer.view()
Error
after which I get this error
Traceback (most recent call last):
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\Scripts\interpreter.exe\__main__.py", line 7, in <module>
sys.exit(main())
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 610, in main
start_terminal_interface(interpreter)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\start_terminal_interface.py", line 576, in start_terminal_interface
interpreter.chat()
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 191, in chat
for _ in self._streaming_chat(message=message, display=display):
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 223, in _streaming_chat
yield from terminal_interface(self, message)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\terminal_interface\terminal_interface.py", line 157, in terminal_interface
for chunk in interpreter.chat(message, display=False, stream=True):
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 259, in _streaming_chat
yield from self._respond_and_store()
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\core.py", line 318, in _respond_and_store
for chunk in respond(self):
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\respond.py", line 86, in respond
for chunk in interpreter.llm.run(messages_for_llm):
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\llm\llm.py", line 180, in run
image_description = self.vision_renderer(lmc=img_msg)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\interpreter\core\computer\vision\vision.py", line 171, in query
answer = self.model.answer_question(
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 93, in answer_question
answer = self.generate(
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\moondream.py", line 77, in generate
output_ids = self.text_model.generate(
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2024, in generate
result = self._sample(
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
outputs = self(**model_inputs, return_dict=True)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 1104, in forward
outputs = self.transformer(
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 959, in forward
layer_outputs = decoder_layer(
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 763, in forward
attn_outputs, self_attn_weights, present_key_value = self.mixer(
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\DEBASER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 382, in forward
query_rot, key_rot = apply_rotary_pos_emb(
File "C:\Users\DEBASER\.cache\huggingface\modules\transformers_modules\vikhyatk\moondream2\9ba2958f5a886de83fa18a235d651295a05b4d13\modeling_phi.py", line 214, in apply_rotary_pos_emb
cos = cos[position_ids].unsqueeze(unsqueeze_dim)
IndexError: index is out of bounds for dimension with size 0
Issue
I'm not sure about the proper workflow to use with interpreter vision after reading this. For the record, I separately installed moondream and it ran great, including its gradio demo. As a sidenote, regarding vision, I am curious about whether Open Interpreter will be able use llama 3.2's vision capability.
Platform
I am running Windows10 x64, modern pc hardware, Windows Terminal > Powershell Do I need to be using WSL?
Attempts
When trying to use '
interpreter --local --vision
' with llama 3.2, it doesn't seem to be able to use moondream to view anything. I've tried commands like 'what do you see?' and 'take a screenshot and describe it', it doesn't understand it has moondream available.I have also tried '
interpreter --local --vision --os
' with llama 3.2 and get a bit further, it will:computer.view()
Error
after which I get this error
Screenshot
Open Interpreter version
0.3.13
Python version
3.10.11
Operating System name and version
Windows 10