c1505 commented 1 year ago

details

16,000 notes in vault
the size of entire vault folder is 705 MB . 500 MB of that is the smart connections folder. It doesn't seem to me that the embeddings would be 2x as large as the actual rest of the vault so I am guessing there is a problem there

brianpetro commented 1 year ago

Hey @c1505

That vault size may be large enough to break some of the internals. You'll have to check the developer console logs to confirm, but I have a hunch that the issue might be related to this https://github.com/brianpetro/obsidian-smart-connections/issues/123#issuecomment-1490883188

Until Smart Connections better handles that size vault, you have two options:

1) There is a setting skip_sections that should solve the issue, but restricts embeddings to one-per-note. 2) Use the exclusion settings to exclude folders with a lot of notes.

I know neither of those are ideal, but it they should at least get you up and running.

Thanks for trying Smart Connections 🤓

🌴

c1505 commented 1 year ago

Yeah I have the same or similar JSON.stringify error so I assume that particular part of it is the same problem. I am not sure if that is the main part of it freezing though.

I commented out the code the calls to await this.get_embeddings_batch(req_batch); so I could still use what the created embeddings.
It didn't completely crash after that, but after opening a new note, it would freeze for what seemed like 2 seconds.
found that it was spending ~20 seconds on clean_up_embeddings(files)
after also commenting out the calls to that, I was able to use the plugin. doesn't update embeddings anymore though

It seems like there are a few ways to modify the saving so that it could handle larger vault sizes. I imagine that the larger the file gets, the worse the rest of the performance gets without some other optimizations though.

brianpetro commented 1 year ago

@c1505 the brief pause after opening a note happens because it's searching through all the embeddings to compare to the current note in a non-optimized way.

Excluding folders and the skip_sections option will also help with that issue since it should reduce the number of embeddings searched.

Ps- you might also have to use the "force refresh" button in the settings to make the changes take effect.

🌴

minthemiddle commented 9 months ago

@brianpetro Which issue/topic to follow about performance improvements for very large vaults. I have ~15k notes and Smart Connections slows down Obsidian in a way that I deinstalled it again. Right now, I consider my use case beyond the scope of Smart Connection but would be happy to learn if new strategies (e.g. vector database for storing) are implemented.

brianpetro commented 9 months ago

@minthemiddle are you using V2? If so, you can set the Block model to None, and Smart Connections should handle 15k embeddings no problem.

If you're trying to embed them with a local model, that will take some time and make Obsidian unusable during the process, but you can pause it at anytime.

As for handling blocks at that size, too, it's not out of the scope of my focus at all. Right now, the implementation is designed to instantly scan every embedding, every time. But, that isn't necessary with clever optimizations. And there are many more layers of embeddings that can be added to do interesting and useful things, so handling more embeddings is a must.

So I'd start with trying to embed notes only, see if that works.

And I'll continue thinking about what I can do to improve storage of large amounts of embeddings. PS- I know "vector dbs" sound like an easy solution, but the trade-off is giving away future flexibility to implement advanced calculations for short-term ability to run a simple calculation at scale, and I see much more value in the former.

🌴

jonathandgall commented 7 months ago

Instead of creating a new issue, I will add my case here as the symptoms are similar to @c1505 ones, even though embedding was initially completed in my case.

My vault currently contains 19302 files totaling 549.1 MB. It represents slightly more than four years of notes and journaling. It relocated from Roam Research to Obsidian a few months ago.

I have five files in .smart-connections:

smart_notes-bge-micro-v2.ajson - 306.8 MB
smart_blocks-bge-micro-v2.ajson - 97 MB
smart_notes-bge-micro-v2.failed-1712598416346.ajson - 8.4 MB
smart_notes-bge-micro-v2.failed-1712600234238.ajson - 8.4 MB
smart_blocks-None.ajson

I use a Mac Mini i7 with 32 GB of RAM. I do not use Smart Chat right now, but I plan to reintroduce it once I have addressed the performance issues.

Smart connection was slowing my Obsidian experience to a pulp. The latency between keystrokes and their appearance on screen prompted me to use Zed as my text editor. And this is despite only using BGE-micro-v2.

I followed the advice in this thread and excluded all folders, making Obsidian more usable. I can still see (and hear) my CPU spiking when I type, but it is not as bad as it was before, when it would spike continuously as long as Obsidian was open. I didn't figure out how to access the skip_sections setting and would appreciate guidance on this.

I would like to add one thing: @brianpetro, it is a rare treat to see someone who communicates so well in their GitHub issues. Despite my lackluster experience, seeing you so active here motivates me to persevere and find a way to make this work.

What would you recommend at that point to get the most out from my vault?

brianpetro commented 7 months ago

Hey @jonathandgall

Thanks for the insights and kind words 😊

The slowing of Obsidian is due to the local embeddings running within the same process as Obsidian.

One way I'm circumventing that is with the Smart Connect software (currently in testing with supporters) which will be made freely available for Smart Connections users to use for improved local embedding performance. By offloading the local embeddings to the Smart Connect, embedding is both faster and doesn't block other processes within Obsidian.

It's also possible that this might be achieved by initiating a side-process from within Obsidian, removing the requirement for an additional software, but so far my attempts at this have failed and it simply may not be possible depending on restrictions Obsidian places in the environment.

In the meantime, it might be worth trying setting Block-level embed model to "None". This will cut the total number of embeddings by more than half but will still enable all features. This is the same as the skip_sections setting. I appreciate you searching the GitHub and finding that recommendation, though!

I hope that makes sense. And thanks for trying Smart Connections!

🌴

wowpala commented 6 months ago

I have configured Jina local models for embedding purposes within my settings. However, I've noticed that Smart Connect keeps attempting to establish connections with public IP addresses. If these servers are inaccessible, it results in a halt of the embedding workflow. How can I ensure that Smart Connect utilizes the local models exclusively and avoids unnecessary connections to public IPs?

brianpetro commented 6 months ago

@wowpala thanks for bringing that to my attention.

Besides initially downloading the model from huggingface, there is no reason Smart Connect should be connecting to any external server other than connect.smartconnections.app.

So those persistent requests make no sense to me. I'm concerned that those requests might be something specific to that Jina model.

I recommend switching to another model immediately, preferably BGE-micro as it has been the most reliable, to see if the issue persists.

Let me know how it goes 🌴

brianpetro commented 6 months ago

@wowpala I've been digging into my network traffic because of your comment.

Unknown requests are a dealbreaker. So I couldn't ignore this.

Here's what I found:

Given that all of the requests in your screenshot happened simultaneously and that HuggingFace uses CloudFront (based on my testing), it's safe to say that those requests are the initial download of the model from HuggingFace. This should only happen once per model, as the models are cached. So, you do have to be online the first time the model is used. Subsequent usage should not re-download the model.

Additionally, I tested that Smart Connect works to process embeddings when disconnected from the internet. It works! Of course, this will only work if you already have the model downloaded, requiring an internet connection on the first use.

In conclusion, I believe it's safe to say there is no cause for concern related to that network traffic.

On a related note: this is why I am adamant about minimizing the use of external dependencies in sensitive applications or anywhere users believe their data is private, like Obsidian.

For the record, Smart Connect(ions) relies on transformers.js, an open-source module, supported by the very reputable HuggingFace, for running AI models.

As a principle, I will continue to avoid using third-party dependencies so that you can be confident in the software you are using.

🌴

wowpala commented 6 months ago

I did some more testing, as follows:

Using BGE-micro;
If I keep the 'Smart Connect' program running, it indeed won't access the public network server;
However, if I restart it, it will connect to huggingface.co, and if the connection fails, the embedding work won't continue.

It seems that it's not related to whether the model is BGE or Jina, but rather that every time 'Smart Connect' starts, it needs to connect to huggingface.

brianpetro commented 6 months ago

@wowpala, thanks for following up and letting me know about that. I will have to look into this further to make sure it works when offline.

🌴

matlemai commented 5 months ago

i am also having issues with the plugin. Obsidian is unusable when it is activated and I can't change any of the parameters. How can I force pause/interrupt the process to link it to smart connect?

brianpetro commented 5 months ago

Hi @matlemai

If Smart Connect is open prior to starting Obsidian, it should automatically connect before starting any embedding processing.

🌴

matlemai commented 5 months ago

Hi @matlemai

If Smart Connect is open prior to starting Obsidian, it should automatically connect before starting any embedding processing.

🌴

Thanks - my Smart Connect is connected; when I start obsidian (1.6.3) with smart connect (1.1.55) already active in the background ("Connected to https://connect.smartconnections.app), it does not change anything. It still always stalls in the same spot.

[Smart Connections] Making Smart Connections... Progress: 2110 / 5512 files

... and I must remove the plugin entirely after quitting Obsidian to be able to use the app again. Any idea what the problem may be?

Thanks

brianpetro / obsidian-smart-connections

plugin is slow and freezes. embedding has not completed after a few hours of letting it run #356

details