Closed poonai closed 2 years ago
Thanks for the PR :) what do the benchmarks look like before and after? (Assuming a data set of at least 5GB)
regarding the benchmark:
If we do a sorted insert, then we don't find anything big. Because the pebble itself is optimized enough to skip through sorted keys.
we have to run it on some realistic workload, to see whether it makes sense to have this optimization.
I can create some dataset that has randomized keys to benchmark or do we have any dataset which I could probably use to run the benchmark?
We have another app consuming the data set but I think writing a simple tool to produce random data just for bond testing is a good idea
Thanks, I’ll come up with the tool :)
On Sat, 29 Oct 2022 at 6:47 PM, Peter Kieltyka @.***> wrote:
We have another app consuming the data set but I think writing a simple tool to produce random data just for bond testing is a good idea
— Reply to this email directly, view it on GitHub https://github.com/go-bond/bond/pull/57#issuecomment-1295834384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMMCFGZQHW5QNNYUJYERRTWFUPXJANCNFSM6AAAAAARRVPVJE . You are receiving this because you authored the thread.Message ID: @.***>
notify: @pkieltyka
I've implemented a benchmarking tool for long-running jobs as per your suggestion. (tried to mimic ycsb)
Here are the results preliminary results:
master
Total time taken to insert 19m40.066199548s
size of database 753 MB
Total time taken to insert 19m49.239178586s
size of database 763 MB
poonai:poonai/insert_batch_seperation
Total time taken to insert 17m12.718217997s
size of database 752 MB
Total time taken to insert 17m8.408880599s
size of database 746 MB
Assuming a data set of at least 5GB
I didn't run for 5 GB
since it takes a lot of time on my small machine.
But, I'm curious to stress the machine myself. Will update here, once it done 😄
nicely done, so a ~13% improvement :)
do you think if we sort indexes before we write the to the index batch will make a difference..? or should pebble already be sorting keys during the batch write?
pebble will do the sorting internally.
@poonai Hi, Nice to meet you.
I really like the benchmarking tool that you have prepared. I haven't expected that big gain that's why I have run your benchmark too.
master
Total time taken to insert 17m48.307964933s
size of database 756 MB
Total time taken to insert 17m28.721275263s
size of database 755 MB
Total time taken to insert 17m38.046745363s
size of database 763 MB
poonai:poonai/insert_batch_seperation
Total time taken to insert 17m22.238643583s
size of database 750 MB
Total time taken to insert 17m24.658270013s
size of database 757 MB
Total time taken to insert 17m29.262544851s
size of database 759 MB
It's still better, however the difference isn't that big anymore. I have made sure that nothing else is running on my os at that time. Benchmarking sometimes can be tricky.
We would like to keep this change for a few reasons:
In order to keep indexing logic consistent across the board: Could you apply the same thing to Update, Upsert, and Delete?
In addition to that please move benchmarking tool to cmd/tools/<name_of_the_tool>
Thank you :)
@marino39
Peter mentioned that you are the core contributor to Bond. It's pleasure meeting you!
Yes, benchmarking is tricky. That's why I ran multiple times, turned out that my machine punished me :(
I've updated the pattern to Update
and Upsert
and also found that it's not required for Delete
Thanks for accepting the changes.
What changed made ?
This PR track all the index keys in a separate batch and then applies the index keys at the later stage.
Rationale for the change
Index keys are not required for checking the
record duplication
.Note
indexKeyBatch
are lightweight since they're inserted back intoPool
while closing.