chainguard-dev / malcontent

#supply #chain #attack #detection
Apache License 2.0
446 stars 31 forks source link

Refresh sample test data via new `refresh` command #634

Closed egibs closed 1 day ago

egibs commented 1 day ago

Running make refresh-sample-testdata has been taking longer and longer as we ramp up the number of samples we have test data for and can now take at least five minutes.

This PR removes the refresh-testdata.sh script and adds a new refresh command to mal which will instead use Malcontent as a library to refresh the test data.

In my testing, it was at least three times faster and the output is 1:1 with the original script (which took a bit of doing with the TrimPrefixes field and making sure each path string was accounted for).

Anecdotal timing from my Framework:

$ time make refresh-sample-testdata
mkdir -p out
go build -o out/mal ./cmd/mal
./out/mal refresh
Sample data refreshed: 307/307 (progress/total)
Successfully refreshed test data for 307 samples

________________________________________________________
Executed in  123.80 secs    fish           external
   usr time   18.13 mins    0.00 millis   18.13 mins
   sys time    0.69 mins    4.74 millis    0.69 mins

My M1 Pro MBP took ~100 seconds and an i9-14900K took ~65 seconds.

tstromberg commented 1 day ago

Love it!

Future idea for improvement: take path names as an argument if you just want to update a single path name.