Closed prabhu closed 7 months ago
@prabhu So I've been looking into this. My observations either mean the batch size is relatively inconsequential or else that I don't know what I'm doing. Probably the latter.
@cerrussell This is interesting! Do you have any idea where the time is being spent? Can we try any profiler?
@prabhu I've been using cProfile and playing around with different decorators. Also looking at how other projects set things up. It's all very new to me, but I certainly welcome the opportunity to remedy this knowledge gap. I'm working on setting up a workflow in GA that can take test projects and run depscan with profiling. I'll let you know when I've come up with anything worth looking at!
Thanks, @cerrussell, for your help. Please join the discord channel as well, in case you haven't. I would love to know more about your workflows.
Hey @prabhu , I joined the discord.
Just an update. I've been testing out batch_write_size values. I'm still working out what the value is that will yield a net improvement rather than simply redistributing where time is spent.
@cerrussell, could you kindly say hi on discord or share your discord id? Looks like a few people have joined.
@prabhu My ID is lemonym#2286, display name caroline.
v5 gave the needed performance boost for searches. However, one of the things I am not happy with is the hardcoded batch size while storing a group of records.
https://github.com/AppThreat/vulnerability-db/blob/master/vdb/lib/storage.py#L9
I picked this number out of thin air. Bad hotel internet means I could never experiment with various batch sizes to measure the impact on storage vs. search performance.