Closed dvl00 closed 4 months ago
Hi @dvl00 and thanks for the report!
So that entire 471mb of embeddings is currently loaded into memory, and it's likely that is causing your issue. Anything to increase your available RAM would be helpful in preventing the crash.
I do have some strategies/plans to improve this process so that all 471mb aren't constantly in memory.
Good news is, I have some other ideas that will be increasing the number of embeddings per note, so this issue is likely to become more common among users with less massive vaults. So that's good news for getting the above strategies/plans that will solve your issue. It means they should be implemented sooner than later, since more people affected increases the priority.
I hope that clears things up for you.
Thanks again for contributing this report!
Brian 🌴
Thank you @brianpetro !
Really appreciate the prompt response.
I will remove the extension for now but will keep my embedding file because it was very expensive to produce it! I will anxiously wait for the updated version. I got to try it for a little bit as my embedding file was being created and oh my goodness, it was very useful!! Thank you for your attention and diligence on this. For many of us obsidian has become part of our day to day life and people like you, who develop these sort of innovative plug ins, are very much appreciated!
Thanks again~~
@brianpetro Sorry to bother you again, do you have a rough timeline for these releases? I just want to make sure to install the extension once its suited for my needs.
Thank you!
@dvl00 no ETA as of yet.
I have some suggestions for optimizing storage.
Have you considered using CSV instead of JSON? As each file's information is a line of similar data, JSON requires storing data in the form of key-value objects, resulting in many redundant keys. Converting objects to arrays can optimize storage and memory, especially for the "vec" object in the embeddings-2.json file.
Furthermore, I have added the storage of embeddings to the Git repository. If we use the CSV format, it would be clear which files have updated or added embeddings, making it easier to track changes than modifying the entire embeddings-2.json file.
I want to express my sincere gratitude for your great work on this project. This plugin has become a vital part of my life, and I appreciate the effort and dedication you put into it.
I think the csv idea might be good, but it might not be enough for mobile devices. They have limited storage and processing power, so we might need a more robust solution. Maybe we could use a cloud service to store and access the data more efficiently. But this is just my opinion - brianpetro is the genius behind this plugin and he knows best what works for his project.
@yekingyan I have thought about CSV. Even though they're redundant, the JSON keys make up <1% of the file. This is because each embedding contains a vector (array of decimals) that's >15,000 characters. This plus using JSON makes data storage object both easier to work with and more flexible. So for those reasons is why I decided to stick with it opposed to CSV.
@dvl00 I tend to want to stay away from adding additional cloud services, but, I do see eventual integration as likely because that would allow people to reuse their embeddings in other applications. In the meantime, I think the biggest performance gains will come from strategically splitting the embeddings.json
file into more parts. This way, only some embeddings can be loaded into memory at a time, and syncing the embeddings won't require re-downloading all of the embeddings every time.
Thanks both of you for your thoughts on this, Brian🌴
Closing this in favor of new issues being created in the JS Brains Smart Collections project repo. If you want an additional storage method, please request an adapter for Smart Collections in that project. My full response to this issue will be published on YouTube and I will link to the video from my next response to this issue 🌴
recorded response https://youtu.be/5b2yd-4EZaI
Hi there! First of all thank you for this amazing extension. My vault, which is about 18k notes, has an embedding file of 471mb. Everytime that I now start loading obsidian with the created embedding file, it crashes and turns into a black screen. Help! I really would like to use this extension as my vault is growing so much! Thank youuuu