Closed arminta7 closed 1 year ago
The 429 happens when you get rate limited because the code embeds notes on batches of 10 at a time, and sometimes Open AI returns 429 at that rate. That should be OK because it should re-embed only those notes the next time it runs, which happens every time you open a new note or run the "Find Connections" command.
The embeddings shouldn't re-run unless the file.mtime
changes, which happens when you update the note.
Since you have 20K notes, the code could still be catching up to cover all the rate-limited notes with embeddings. The full process takes much less than a minute with my 1,500 notes. And besides your very first run of the embeddings, which may take up to 15 minutes, it should always finish processing in less than a minute after that.
The reason is, once the embeddings process is complete across all of the notes, then it should only scan the notes to look for changes in file.mtime
and deleted notes to update the embeddings for those.
One thing to be aware of is if you have any other active plugins that may be updating the file.mtime
in the background, then that could be triggering unnecessary additional embeddings. This is unlikely but possible so I wanted to mention it.
Any additional notes you can provide about the performance, like after which specific actions it feels most laggy, would be a big help as I turn my eye to the performance updates.
Thanks!
I turned off all other plugins and left it for a bit and it seemed to have completed. But has now started repeating some notes again. Notes that I'm pretty sure aren't being modified. Here are some other errors I found going through the console:
Thanks for sharing your console logs.
I improved the logging. The new thing to pay attention to is the files_embedded
log. Every "run" will display the number of files embedded in the console. This will help us determine how many files are being re-embedded each run. This way we can better diagnose the performance issues from there.
I attached a screenshot of what my console typically looks like after this latest update.
I don't see that at all.
I have some of these:
It was getting 429 over and over probably 100's of times, wouldn't stop when I turned off the plugin. I had to force reload Obsidian.
I wonder if your .smart-connections/embeddings.json
is getting cleared somehow.
The 429 errors may need to be addressed by slowing down bulk embeds. But that error is otherwise harmless.
I also added the erroneous embed
log to see which notes are triggering non-429 (400) errors. The OpenAI API doesn't like something in that note. It's hard to say, but if you could share that entire note with me, I could try to reproduce it.
I got that error (erroneous embed) quite a lot. Here's the content of one of them:
The exact timing for bedtime and naps varies from baby to baby. But when it comes to sleep schedules, your 10-month-old has probably fallen into a fairly predictable pattern.
Babies this age usually wake on the early side, take a morning and an afternoon nap, and go to bed between 7 p.m. and 8 p.m., getting 10 to 12 hours of sleep during the night. At 11 and 12 months, she will likely follow a similar schedule.
Here’s an example of what your 10-month-old's sleep schedule might look like:
Should I maybe try to uninstall, wipe the embeddings file and start over?
Don't wipe the file. Best to never delete it.
Copy & pasting that file worked fine for me.
Are the same files repeatedly ending up in the erroneous embed input
log?
If yes, you can try uploading one of the files and I can try to reproduce using that.
If the same files aren't repeatedly appearing, then they may work after a second/third try and eventually stop appearing altogether 🤞.
Here's one file I keep seeing. I am getting actual link suggestions in the sidebar now on some pages, but when selecting the note below I got this error. With the new logging it's very difficult to troubleshoot because the note contents are usually quite long and trying to scroll through and see what's happening or what might be repeating is tough.
DarkHorse Podcast with Daniel Schmachtenberger & Bret Weinstein.md
As I'm clicking through it seems to be resolving some and I'm seeing more of this like you showed above:
I feel like the issue is maybe with whatever related links it's trying to pull up? Because navigating to specific notes always causes an issue and others are fine.
I just pushed an update that cleans up the logs and improves the batching, so you should see fewer 429s.
It's good to see that "files embedded: 0" because that means it atleast thinks it's up-to-date on the embeddings.
Also, I was able to reproduce the error with that file you uploaded. Still looking into it...
From the note you shared I was able to identify an edge case where long headings and sections were exceeding the 8,000 tokens limit. I placed a restriction to prevent that from happening in the latest update.
I'm not sure if this is related, so I'm putting it here:
The 429 errors are just an artifact of such a large vault, since the plugin is trying to make a request for every note and for every section each note contains, and that large number running up against OpenAI rate limits, which doesn't affect the functionality of the plugin besides taking a longer for the initial runtime.
2. Once all your Embeddings are completed, the usage will be reduced to just the new and modified files. That will change depending on your usage, but it's just been pennies per day for me, though I am pretty restrictive with what I keep in my notes. OpenAI advertises something like embedding "3,000 pages for $1" it's hard to imagine making that many note changes, but there are always outliers. And large notes will be less efficient regarding cost.
$30 for an initial run makes sense to me based on your vault size. Open AI billing day ends at midnight at UTC+0 (7 pm EST), so the processing last night was probably split across both days. Have you looked at the hourly usage? I would expect that the bulk of tokens used was last night and split to both days, as opposed to persistent usage throughout today.
Unless large swaths of your files are being updated programmatically (based on file.mtime
change), that usage amount should be a one-time occurrence.
The only other reason it would re-embed all the files is if the embeddings.json
file got deleted.
@arminta7 I'm closing this issue in favor of starting new error-specific issues as they arise. Feel free to schedule a call with me. It would be great for me to have a chance to chat with a power user like yourself. Thanks for all your help getting the plugin up to speed!
Plugin has been embedding for a long time. Not sure if it's doing something productive or running into an issue. I'm seeing an error 429 frequently:
Obsidian is also just generally laggy while this is going on.