LLNL / Surfactant

Modular framework for file information extraction and dependency analysis to generate accurate SBOMs
MIT License
24 stars 16 forks source link

Adding basic framework for the CLI, no changes to old cli. #261

Open shaynakapadia opened 1 month ago

shaynakapadia commented 1 month ago

Summary

This MR adds the basic setup for the new non-interactive surfactant cli.

If merged this pull request will

Proposed changes

The changes here will migrate the existing cli interface to the new structure. Proposed workflow is below:

surfactant cli load sbom.json # Loads the sbom into surfactant, surfactantn saves it in ~/.surfactant in a serialized form
surfactant cli find --containerPath=^123* # Loads from serialized form and finds subset that matches args
surfactant cli add --installPath 123/ /bin/ # Adds new install path based on containerpath
surfactant cli merge # merges changes to the subset from find back into the main sbom
surfactant cli find --uuid 123 # Find one entry to edit based on uuid
surfactant cli edit --components="IsAGRAF" # Editing an array by picking the element, this one edits a specific component in this entry
Current Value: {"name": "IsAGRAF", "Vendor": "Rockwell Collins Automation"}
New Value: {"name": "IsAGRAF", "Vendor": ["Rockwell Collins Automation"], "version": "1.2.3"} 
surfactant cli edit --name # Edit a string value
Current Value: oldname.out
New Value: 1.2.3.CPO.out
surfactant cli merge # Merge changes back into the rest of the SBOM
surfactant cli save new_sbom.json # save edited sbom to a new file
shaynakapadia commented 4 weeks ago

It will be interesting to see how the performance for (de)serializing larger SBOMs is -- it looks like that will need to happen for every command that gets ran?

Ran some timing on the surfactant cli load cmd, which both serializes and deserializes. Not sure why 72.7 MB and the 134 MB is going slower than the larger ones, but could be the nesting or something. Right now the serialization isn't really serialization, its just writing json to file. I was running into issues with python pickle, so am working on figuring out a workaround.

SBOM Size Avg Time
134 KB 0.720 sec
783 KB 0.708 sec
5.8 MB 1.356 sec
11.5 MB 3.641 sec
72.7 MB 2.806 sec

Update: Compared msgpack to json, and msgpack is a bit faster, mostly on the packing, but also on the other tasks.

size json_pack msgpack json_unpack msgunpack json_write msg_write json_read msg_read
725.7 MB 14.833 12.3212 10.531 10.486 0.1092 0.081 0.0894 0.0704
72.7 MB 1.4576 1.1906 0.9776 0.9380 0.0102 0.0080 0.0084 0.0060
11.5 MB 0.8108 0.6802 2.4992 2.4526 0.0022 0.0012 0.0010 0.0010
5.8 MB 0.3154 0.2556 0.5292 0.5348 0.0010 0.0010 0.0010 0.0010
783 KB 0.0560 0.0446 0.1458 0.1420 0.0000 0.0000 0.0008 0.0004
134 KB 0.0090 0.0072 0.0198 0.0198 0.0000 0.0000 0.0000 0.0000
237 B 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
shaynakapadia commented 2 weeks ago

So pickling is in fact significantly faster if I pickle the class directly. I can't do this unless I take care of the mappingproxy type first, but with some very minimal pre and post processing it works

def serialize(sbom):
    for k, v in sbom.__dataclass_fields__.items():
        v.metadata = {}
    return pickle.dumps(sbom)

def deserialize(data):
    sbom = pickle.loads(data)
    for k, v in sbom.__dataclass_fields__.items():
        v.metadata = MappingProxyType({})
    return sbom
Timing results here: Size json_dumps msgpack pickle_dumps json_loads msgunpack pickle_loads
725.7 MB 15.6412 12.4518 1.3130 10.6634 10.5556 0.7808
72.7 MB 1.5228 1.2606 0.0946 1.0528 1.0048 0.0794
11.5 MB 0.8552 0.7242 0.0304 2.6202 2.6132 0.0502
5.8 MB 0.3380 0.2650 0.0092 0.5446 0.5404 0.0112
783 KB 0.0560 0.0480 0.0012 0.1524 0.1472 0.0020
134 KB 0.0090 0.0070 0.0000 0.0202 0.0200 0.0000
237 B 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000