AppThreat / vulnerability-db

Vulnerability database and package search for sources such as Linux, OSV, NVD, GitHub and npm. Powered by sqlite, CVE 5.0, purl, and vers.
MIT License
96 stars 22 forks source link

Spend more time on batch size for storage #44

Closed prabhu closed 7 months ago

prabhu commented 1 year ago

v5 gave the needed performance boost for searches. However, one of the things I am not happy with is the hardcoded batch size while storing a group of records.

https://github.com/AppThreat/vulnerability-db/blob/master/vdb/lib/storage.py#L9

I picked this number out of thin air. Bad hotel internet means I could never experiment with various batch sizes to measure the impact on storage vs. search performance.

cerrussell commented 1 year ago

@prabhu So I've been looking into this. My observations either mean the batch size is relatively inconsequential or else that I don't know what I'm doing. Probably the latter.

prabhu commented 1 year ago

@cerrussell This is interesting! Do you have any idea where the time is being spent? Can we try any profiler?

cerrussell commented 1 year ago

@prabhu I've been using cProfile and playing around with different decorators. Also looking at how other projects set things up. It's all very new to me, but I certainly welcome the opportunity to remedy this knowledge gap. I'm working on setting up a workflow in GA that can take test projects and run depscan with profiling. I'll let you know when I've come up with anything worth looking at!

prabhu commented 1 year ago

Thanks, @cerrussell, for your help. Please join the discord channel as well, in case you haven't. I would love to know more about your workflows.

cerrussell commented 1 year ago

Hey @prabhu , I joined the discord.

Just an update. I've been testing out batch_write_size values. I'm still working out what the value is that will yield a net improvement rather than simply redistributing where time is spent.

prabhu commented 1 year ago

@cerrussell, could you kindly say hi on discord or share your discord id? Looks like a few people have joined.

cerrussell commented 1 year ago

@prabhu My ID is lemonym#2286, display name caroline.