Closed Luke2791 closed 1 month ago
Hi @Luke2791
Given the example would trigger a context lookup (contains self-referential pronoun), the round-trip response would be expected to take >2X as long as one without context lookup.
This is because two requests are made to the LLM (see this explanation of HyDE to learn more about why).
Notably, the second request will contain a lot of context, so it would be like running a request with a really long (pages long) prompt. This is probably the cause of your computer's noticeable increase in resource usage.
Perception is also partly the issue since Ollama cannot stream results to Obsidian due to a CORS issue. This means that the time comparison needs to be compared after the last and not the first tokens since Smart Chat has to wait for all tokens to be generated before receiving the result.
I hope that helps clear some things up. 🌴
Great response and very helpful! Thank you, @brianpetro ! Also - I see somewhere else someone mentioned including the option for LM Studio in a future release - that would be excellent! LM studio also has an option for CORS that might help resolve the issue that Ollama is facing (as opposed to using Llama3.1 via LM Studio with CORS turned on).
Context for the issue:
ISSUE:
Why is this? Is it simply that my vault is being loaded into the model and/or being used as part of the "context"? Or is there a "double" loading of the LLM (I know - probably a dumb question)? i.e. - is the computer loading it once in command line and another time in obsidian, loading down the computer? Further, I do not fully understand why answers in the obsidian chat are so slow, when compared to asking identical prompts in the command line. I.e. 'Tell me about Isaac Newton' may take a second or two in command line - but 15 to 30+ seconds when asking the identical question in obsidian. Thank you all for your help in this! GREAT Plug in :)