clj-holmes / clj-watson

clojure deps SCA
Eclipse Public License 2.0
84 stars 9 forks source link

Consider a more cache-friendly db location #65

Closed lread closed 3 months ago

lread commented 3 months ago

Currently

The default nvd database location is /tmp/db/cache

But...

This isn't terribly CI service-friendly when it comes to caching.

There are big advantages to caching the nvd database so that it can be updated with new entries rather than entirely re-downloaded. Updating:

Options

Some options I've thought of:

Option 1: Do nothing

Not interested in addressing this issue.

Option 2: Document only

Document a way for the user to change the database location. I do see the potential configuration entry that might be overridden.

I suppose the user would do this via --dependency-check-properties cmd line arg. (Would have to verify)

Option 3: Allow user to easily override

  1. Perhaps there is merit in promoting this option to its own command line arg?
  2. Or at least letting it be overridden via --clj-watson-properties

Option 4: Default to the default dependency check location

The default location is under the ~/.m2 repo alongside dependency check dep. Example: /home/lee/.m2/repository/org/owasp/dependency-check-utils/10.0.3/data/9.0 (So, in other words, don't override the default for clj-watson).

Proposal

I like option 4. This could be combined with Option 3.2 for relatively easy overriding. I personally don't think /tmp is a great spot for this db in general - and especially for CI.

Next Steps

I am also happy to explore further. If you agree we should take action, I volunteer to create a PR.

seancorfield commented 3 months ago

I've often thought it's a bit odd that Watson completely overrides the DependencyCheck defaults instead of doing so selectively. I know it needs to override the ${pom.*} values but there's a maintenance overhead in trying to keep dependency-check.properties basically in sync with the dependencycheck.properties file in DependencyCheck itself.

I think I'd like to dig deeper into this and use the core lib's properties by default and only override things that Watson specifically needs to override and then, yes, option 4 could easily fall out of that and also 3.2 would just work automatically (clj-watson.properties is already intended to be the way to override any of the defaults -- but it is poorly-documented).

I think it would also make life easier to add an option that can specify individual property overrides (multi-option producing a vector).

lread commented 3 months ago

Ya, agreed. If I understand nvd-clojure correctly it concerns itself only with dependency-check settings it is specifically interested in.

lread commented 3 months ago

While testing, I was surprised that the entire nvd db was being downloaded again. Then I remembered that an OS is free to clear out /tmp/. My Linux box does so on startup. So, another vote against /tmp/... being a default location; this one is not CI-specific.

seancorfield commented 3 months ago

I think this will fall out of the work in #66 and then it'll just need a readme update regarding the default location of the db and how to override it?

lread commented 3 months ago

I think the fact that clj-watson is overriding a sensible DependencyCheck default for data.directory is a mistake.

I think we should document where the db is saved and how to cache it sensibly on CI. I've worked this out for pomegranate and clj-yaml, so can help with this.

We could make a short mention of how to override the default db location (via system property or cljwatson.properties), but we could recommend against doing so. Taking inspiration from DependencyCheck docs on the configuration of the data directory:

... This should generally not be changed.

seancorfield commented 3 months ago

Agreed. I notice right now that DependencyCheck has:

data.directory=[JAR]/data/9.0

So that has stayed the same across 9.x and 10.x -- and we need to decide whether to stop overriding it (my preference, since it would then become CI cache-friendly as part of .m2) or keep it as /tmp/db/ and emphasize that it isn't CI cache-friendly but here's how to change it...?

I guess the thing that needs to be verified is whether DependencyCheck will still correctly expand [JAR] if that property is passed in from a caller like clj-watson, as opposed to only handling it within its own dependencycheck.properties file?

lread commented 3 months ago

I like your preference of clj-watson no longer overriding data.directory. We could make a very brief mention of this property in the README.

[JAR] expands to more than I had originally guessed. By default when using dependency-check 10.0.3.

[JAR] becomes:

[JAR]/data/9.0 becomes:

So I don't think the 9.0 (whatever it means) is terribly important.

Example of override via system property:

clojure '-J-Ddata.directory=[JAR]/foobar' -J-Dnvd.api.key=<key here> -M:clj-watson scan -p deps.edn

And db is written to:

seancorfield commented 3 months ago

Ah, very nice! Shall we leave this open as a reminder for the documentation, after #66 is dealt with?

lread commented 3 months ago

We could deal with it directly and close it. I'd have to verify but, probably:

  1. Delete override in dependency-check.properties
  2. Document via changelog and readme

Then #66 could do the fuller and more general work.

Happy to take a crack at it.

seancorfield commented 3 months ago

PR #105 (really PR #106) fixes this I believe since it removes the override of data.directory and documents it.

seancorfield commented 3 months ago

PR #106 has been merged.