brianpetro / obsidian-smart-connections

Chat with your notes & see links to related content with AI embeddings. Use local models or 100+ via APIs like Claude, Gemini, ChatGPT & Llama 3
https://smartconnections.app
GNU General Public License v3.0
2.86k stars 186 forks source link

Thoughts & Feedback on (NEW MODELS) in Beta #516

Open bbecausereasonss opened 8 months ago

bbecausereasonss commented 8 months ago

First off, thanks so much for getting out a Beta release for us to test the other models. I have been using them now since release and have compiled my thoughts as well as ideas below.

  1. ChatGPT is still the most comprehensive at analysis, especially for takeaways, bullets, and subpoints in long articles.
    1. Gemini is VERY good at 'human' writing. Followed by Claude and by far the worst is ChatGPT. For instance Gemini will pass most AI detectors, while Claude Opus may 40% of the time and ChatGPT 0% of the time. Even when you prompt ChatGPT to write more human by explaining the dynamics, it still fails.
  2. Claude is VERY good at creativity
  3. Gemini is VERY good at synthesis
  4. Inference speed is WAY higher in OpenAI models, both Claude/Gemini take a long time to respond (seem to take much longer to parse the embedding data)

Something I thought of that would be absolutely next level is the ability to chain multiple models together, almost like a broad mixture of experts.

Example:

In your prompt, you can prompt ChatGPT, Gemini and Claude to look at a specific issue and make the work better through argument, reduction and error catching.

There would be a way to use the same embeddings, but to ping pong between the 3 models based on the prompt. Of course there are many 'agent' frameworks out there that could make this even more interesting.

I could develop this idea further, if interested but each model has a very unique strength and weakness.

Aside: The in-chat, context conditioning is working much better than before. The models seem to query all the notes, and refer to the chat window simultaneously, before this was impossible and the model would get tripped up and confused.

One workflow could be:

a) ChatGPT creates structure and analysis on various concepts, and pieces of content and creates the outline b) Claude takes the outline and turns it into context (contextually) c) Gemini double checks to see relevance and re-writes in a much more human way especially if we have an example to match writing

brianpetro commented 8 months ago

@bbecausereasonss, thanks for sharing this! With all that I'm building, it's hard to find the time to experiment like this myself. Great insights!

ChatGPT is still the most comprehensive at analysis, especially for takeaways, bullets, and subpoints in long articles. Gemini is VERY good at 'human' writing. Followed by Claude and by far the worst is ChatGPT. For instance Gemini will pass most AI detectors, while Claude Opus may 40% of the time and ChatGPT 0% of the time. Even when you prompt ChatGPT to write more human by explaining the dynamics, it still fails. Claude is VERY good at creativity Gemini is VERY good at synthesis

This is very good to know! Is there anything you feel comfortable sharing about your use-case? It might be helpful to have a little more context.

Inference speed is WAY higher in OpenAI models, both Claude/Gemini take a long time to respond (seem to take much longer to parse the embedding data)

I think it would be useful if I added a tokens/second calculation because, currently, it's hard to compare based on the differences between the APIs. Primarily, OpenAI has a far superior streaming implementation, and Anthropic/Claude doesn't have streaming when using the API in an environment like Obsidian (hopefully, this will change soon!). The Google streaming, to me, seems like they're faking it since the stream is a sentence or more per chunk, while OpenAI is more like a word per chunk, and Google's comes seemingly all at once at the end ~80% of the request time.

The context retrieval could take longer if it retrieves more than is being retrieved for the OpenAI models. But this should be almost negligible. The other difference is the intermediate HyDE completion. So a slower model will feel even slower if it's waiting on the HyDE completion. But, the streaming probably accounts for most of the difference.

Something I thought of that would be absolutely next level is the ability to chain multiple models together, almost like a broad mixture of experts.

This is a really interesting idea. I haven't put much thought into it, but I think it would be a great future addition to allow simultaneous responses from multiple models.

There would be a way to use the same embeddings, but to ping pong between the 3 models based on the prompt. Of course there are many 'agent' frameworks out there that could make this even more interesting.

While not really an agent framework, I've been looking at the "Cannoli" Obsidian plugin and think it would be great to integrate with Smart Connections. I messaged the creator just the other day to ask about enabling Smart Connections embeddings to be used in Cannoli. Besides that, to build the workflow you described, it would just need to be able to access the various models, which could be accomplished through a similar Smart Connections integration.

I could develop this idea further, if interested but each model has a very unique strength and weakness.

If you have any more thoughts, please do share.

Aside: The in-chat, context conditioning is working much better than before. The models seem to query all the notes, and refer to the chat window simultaneously, before this was impossible and the model would get tripped up and confused.

😊 I'm happy you noticed this. The way the Smart Chat utilizes chat history and the available context window have both been improved in v2.1.

I'm also still working towards improving this more. v2.1 is a re-write, but implements essentially the same features. One way I plan on further improving the retrieval is by making the retrieval logic more accessible. Currently, it's just a "bunch of code." But, this code can be modularized so that the "retrieval strategy" looks more like a "list of easy-to-understand functions" rather than the "bunch of code." Eventually, this should make it easy to change the retrieval strategy in the chat and make contributions of new and novel strategies as easy as re-arranging the order of these functions.

Also, looking ahead, I expect to implement actions in the Smart Chat v2.2 similar to those provided to ChatGPT via the Smart Connect software. I like the concept of the AI utilizing actions (aka "tools" or "function calls") because it effectively implements agent-type abilities directly in the chat.

💡 Based on your idea about switching models, I'm making a note (...) to add an action that triggers switching models. This way, implementing the abc workflow you described above could be as easy as creating a system prompt, which would describe when to switch models. Thanks for the idea!

That reminds me of a new feature I haven't previously mentioned: you can now use @ in the chat input to select a system prompt. In the' v2.1' settings, you can specify a folder for system prompts. This is a very early/experimental feature, but it should insert the specified system prompt before the user message. And due to the aforementioned lack of time, I've hardly had time to experiment with it. So I'd be interested to know if/how you manage to utilize it.

It was great to read these insights; I'm sure others will find them valuable, too. Thanks for your continued contributions to the community!

🌴

bbecausereasonss commented 8 months ago

Just wanted to update that after more testing I find myself using Claude 60% of the time, switching the 3 model sizes depending on how much power I need, ChatGPT 30% of the time to fill in gaps and be reliable and Gemini 10% (I know Gemini is powerful but for what ever reason it keeps giving me very short and boring replies and analysis on my notes)