Performance Questions - Githubissues

unphased commented 3 months ago

I'm testing this tool out and i saw some behavior that was not expected

I manually imported around 200k entries from one of my custom history files. two of these entries correspond to each actual command history item, but that's alright, it makes for a bigger test case anyway
performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.
After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.
After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

I dont want to give up just yet so I'm curious if anyone can contextualize the behavior i observed.

ddworken commented 3 months ago

Thank you for the feedback!

performance seems troublesome, after doing some typing and then removing my query under the ctrl+r interface, all the CPU threads that were spawned continue to chew up CPU. I don't think I will use this because of this. I'd certainly like to note that piping a plain text file containing 200k or 1M lines into fzf and interactively fuzzy searching into it is a lot faster and more interactive than what I saw with hiSHtory.

Thanks for raising this, this is something that I personally haven't run into since my total history size is relatively small (~30k commands). I'll spend some time working on setting up benchmarks for this and will see if I can improve performance here at all.

After doing the import i saw some process that might have been uploading the data to the net. i get that it's encrypted but I dont like that this is default behavior.

Yeah, it is all encrypted so it is impossible for anything else to read it. But if you'd rather not have this, see the "Offline Install Without Syncing" section in the readme. This way you can install it in a 100% offline mode without any syncing support whatsoever.

After that, I ran a simple hishtory query command, it eventually returned but not before downloading i estimate 50MB from the internet. I want to understand why it would redownload my whole dataset if I imported it just earlier, which means it should probably be getting cached?

Hmm, interesting. This is unexpected, so I'll also plan on taking a look at this.

ddworken commented 3 months ago

Looping back on this, I'm happy to say that:

Searching performance should be significantly improved by ba21e1c. This will technically only improve performance in cases where there are many results (so it will still be slow if you're searching through 200k entries for only 1 matching result), but this should significantly improve the UX. I'm also planning on experimenting with sqlite's FTS/trigram support to see if we can improve this more.
The issue of re-downloading entries that came from the given device is fixed by #204.

unphased commented 3 months ago

That's awesome! Thank you.

hongyi-zhao commented 3 months ago

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

ddworken commented 3 months ago

PostgreSQL offers advanced features, scalability, and performance, making it ideal for complex applications. So, why not switch to it for implementing more advanced features?

Since hishtory runs entirely on the client-side and is end-to-end encrypted, postgres isn't a great fit. Postgres is generally meant to be run on a server where it contains all the data, so it isn't a good fit for the hishtory use case where the server only stores encrypted blobs and has no visibility into the data.

ddworken / hishtory

Performance Questions #202