Cisco-Talos / Mussels

Apache License 2.0
48 stars 14 forks source link

Download and setup of tools? #16

Open bburgin opened 4 years ago

bburgin commented 4 years ago
  1. Can a tool yaml file have a download section? a. I want to treat tools as full-fledged dependencies, ie: automatically download and use them. b. Can the download section also have a download location, for where to download the tool to a local drive?
  2. Similarly, can a tool yaml have an install section? To do 1. above, some install action is needed after download. Install use cases: a. none (for tools that can be run directly from the download location) b. an extract script - extract the tool to a temp folder for running (for zero install tools) c. an install script - run the installer for a tool or custom install steps (for tools requiring install) These could all work with the install section being a script.

Or am I instead supposed to make a recipe yaml for a tool? And use the recipe yaml along with the tool yaml, to do the use cases? If so, then the recipe and tool information would be in separate files, and possibly separate folders, making the tool info hard to maintain.

Thoughts?

micahsnyder commented 4 years ago

Those aren't options at present. It might be possible to create a recipe that installs a tool if it doesn't exist and use that as a recipe dependency.

I haven't tried it, but I have a feeling that if you define both a recipe & tool to find/download+install the tool, the tool would fail the dependency check at the beginning and it would never continue to install the missing tool using the recipe.

I think your ideas (1) and (2) are very reasonable. Imagine they would only run if the tool isn't found using one of the tool checks.

The install scripts would have to be defined for each host platform instead of host/target. I don't know if it makes sense to include the configure and make script names for the tools. Probably just install?

The scripts in the recipe yaml files provide a handful of variables that evaluate to the recipe work and install directories (e.g.~/.mussels/work/... and ~/.mussels/install). I imagine tool scripts would provide something similar, but should probably use a different directories. After a successful tool install, it could re-run the tool checks. The file_checks check will add the directory where the tool is found to the PATH, so adding something like {install}/bin/<tool name> would make the tool available in the PATH for use in the build.

What do you think?

bburgin commented 4 years ago

I like your approach, some more questions:

  1. What if I want to build a tool and then use it? With your description above, it seems that I would need a recipe yaml that builds the tool and separate a tool yaml that allows consumption of the tool. a. Would these two files live in the same folder or separate folders? b. Would it help to have them be the same file? To allow easier maintenance? c. But then I can see other cases where the user may want them to be in different files: Say if the user wants multiple flavors of tool retrieval for the same tool: One flavor to build the tool and use the built binaries and another flavor for just downloading an official build of the tool and using it. Would these be separate tool files? Perhaps we want a single tool file that consumes binaries prepared during earlier recipe file runs? This way, the recipe file could know to either build or download? Or have two recipe files, one for build and another for download?
  2. Is Mussels a sequencer? Or is it up to the user to sequence recipes? Are there example sequences? It would at least help to have some example sequences, so a user knows the best practice for stringing together recipes.
  3. How does a user pass data to a recipe or tool? Does a recipe or tool get passed command line args?
  4. Does Mussels allow for user defined variables?
  5. How would a user pass data between recipes?
  6. What about allowing the user to customize the /work, /install, etc folder locations?
  7. What about allowing the user to not add a tool to the PATH? Since some scripts and tools can be picky about PATH pollution.
micahsnyder commented 4 years ago

Brian,

  1. Sorry no I was thinking one yaml file for the tool that includes both the checks, and the url and install script in case the checks fail. Maybe something like this?
name: template_tool
version: ""
mussels_version: "0.1"
type: tool
platforms:
  Posix:
    path_checks:
      - template_tool
    command_checks:
      - command: "template_tool --version"
        output_has: "template_tool 1.2"
    file_checks:
      - /usr/local/bin/template_tool
      - /usr/bin/template_tool
    install_url: https://www.sample.com/download/template_tool
    install_script: |
      ./template_tool --prefix={install}
  Windows:
    file_checks:
      - C:\/Program Files/template_tool/bin/template_tool.exe
    install_url: https://www.sample.com/download/template_tool.exe
    install_script: |
      .\template_tool.exe /DIR={install}

I think you're right, multiple flavors would mean multiple tool yaml files, each with unique tool names.

  1. I'm not entirely sure what you mean by a sequencer. Mussels will build recipe dependencies in a correct order. Check out the ClamAV Mussels cookbook. There are a few recipes with common dependencies: https://github.com/Cisco-Talos/clamav-mussels-cookbook/blob/master/recipes/libcurl-7.yaml Mussels will assemble a dependency chain to build everything in the correct order. It will fail if there is a circular dependency.

I would also like to put together some example recipes in the scrapbook repository to demonstrate dependency chains. If you have any ideas for library builds that would make for good examples, I'm all ears.

  1. The idea of passing a command line args to a recipe is something I hadn't considered, but could be really interesting. Do you have any specific ideas of how this might work?

  2. I'd like to add user and/or tool defined variables. This should definitely go on the to-do list.

  3. I don't quite follow about passing data between recipes. Can you elaborate?

  4. The install location can be customized with the build -i / --install option, but I hadn't considered customizing the work folder location. If there's a compelling reason how this would be useful, I'm up for it.

  5. The tool definitions currently have 3 options for detecting if a tool is installed. Only the file_checks option will add the tool to the PATH.

    • path_checks: Check if an executable already exists in the $PATH
    • command_checks: Run a script, and if returns 0, indicates success / tool is installled.
    • file_checks: Check an exact file path to see if the file exists. If so, will add the file directory to the $PATH.
bburgin commented 4 years ago

Micah:

  1. Thanks for the example. If you want all the steps in one yaml file, I guess that for your template_tool example above, users may also want a configure: field and a make: field This would be for the case where they don't want to download the tool, but instead want to build it from source and use it immediately. a. But then I think a tool yaml is just a superset of the recipe yaml? b. Why not collapse them into one file schema?
  2. Mussels invokes one leaf recipe at a time? While first invoking (in the proper order) all the deps for that one leaf recipe? I guess by sequencer I was referring sequencing of multiple leaf recipes. I'm guessing this is outside the scope of Mussels?
  3. For how to pass data into recipes (leaf and non-leaf) from my external sequencer, env variables or command line args would be possible gnu make does it with env variables: make target FOO=bar
  4. For passing data between recipes, here are some basic cases: a. A compiler tool may output its path to downstream recipes, for the downstream recipes to call the tool b. A build recipe may output some paths to downstream recipes, for the downstream recipes to use c. A build recipe may output some scripts to downstream recipes, for the downstream recipes to parameterize and call d. A test recipe may run a test and output the path to a performance data file. Then in my external sequencer, I could run Mussels again passing in the path to a graph render recipe.
  5. Customizing the work folder could be useful for running multiple Mussels instances in parallel. Since for the parallel case, I think the work folder would need to be unique. Users may be able to get by with the same install folder. I am unclear if Mussels would do the right thing there. If anything, the user could specify separate install folders for this case.
micahsnyder commented 4 years ago

Micah:

1. Thanks for the example. If you want all the steps in one yaml file, I guess that for your template_tool
   example above, users may also want a configure: field and a make: field
   This would be for the case where they don't want to download the tool, but instead want to
   build it from source and use it immediately.
   a. But then I think a tool yaml is just a superset of the recipe yaml?
   b. Why not collapse them into one file schema?

The recipe and tool yaml schemas differ enough that neither could be a superset of the other. In my suggested update to the tool schema would require a separate URL per host OS, in case the file isn't a source archive but is an installer application. For a recipe, you only ever need one URL for the source archive.

2. Mussels invokes one leaf recipe at a time? While first invoking (in the proper order) all
   the deps for that one leaf recipe? I guess by sequencer I was referring sequencing of multiple leaf
   recipes. I'm guessing this is outside the scope of Mussels?

Using the libcurl recipe from the clamav cookbook again as an example, the build-order for recipes would be the following:

2019-12-10 15:12:31 Mussels INFO Build-order of requested recipes:
2019-12-10 15:12:31 Mussels INFO     1 [0: 0]: clamav:libz-1.2.11
2019-12-10 15:12:31 Mussels DEBUG       Tool(s):
2019-12-10 15:12:31 Mussels DEBUG         clamav:cmake-3.14
2019-12-10 15:12:31 Mussels DEBUG         clamav:make
2019-12-10 15:12:31 Mussels DEBUG         clamav:clang
2019-12-10 15:12:31 Mussels INFO     2 [1: 0]: clamav:libxml2-2.9.10
2019-12-10 15:12:31 Mussels DEBUG       Tool(s):
2019-12-10 15:12:31 Mussels DEBUG         clamav:make
2019-12-10 15:12:31 Mussels DEBUG         clamav:clang
2019-12-10 15:12:31 Mussels INFO     3 [1: 1]: clamav:libopenssl-1.1.1d
2019-12-10 15:12:31 Mussels DEBUG       Tool(s):
2019-12-10 15:12:31 Mussels DEBUG         clamav:make
2019-12-10 15:12:31 Mussels DEBUG         clamav:clang
2019-12-10 15:12:31 Mussels INFO     4 [2: 0]: clamav:libssh2-1.9.0
2019-12-10 15:12:31 Mussels DEBUG       Tool(s):
2019-12-10 15:12:31 Mussels DEBUG         clamav:cmake-3.14
2019-12-10 15:12:31 Mussels DEBUG         clamav:make
2019-12-10 15:12:31 Mussels DEBUG         clamav:clang
2019-12-10 15:12:31 Mussels INFO     5 [2: 1]: clamav:libnghttp2-1.40.0
2019-12-10 15:12:31 Mussels DEBUG       Tool(s):
2019-12-10 15:12:31 Mussels DEBUG         clamav:cmake-3.14
2019-12-10 15:12:31 Mussels DEBUG         clamav:make
2019-12-10 15:12:31 Mussels DEBUG         clamav:clang
2019-12-10 15:12:31 Mussels INFO     6 [3: 0]: clamav:libcurl-7.67.0
2019-12-10 15:12:31 Mussels DEBUG       Tool(s):
2019-12-10 15:12:31 Mussels DEBUG         clamav:cmake-3.14
2019-12-10 15:12:31 Mussels DEBUG         clamav:clang

It will just build them in serial in the order listed above.

The numbers in brackets like [0: 0] indicate the stage and order in the chain. libz-1.2.11 must be built first as it's the only recipe in stage 0. libxml2-2.9.10 and libopenssl-1.1.1d both exist in the same stage 1, and could in theory be built concurrently. The same is true in stage 2 with libssh2-1.9.0 and libnghttp2-1.40.0. Adding the ability to build recipes that exist in the same stage concurrently might be a nice optimization. At present, the recipes in the clamav cookbook could use more simple optimizations, using make -j$(nproc) or using ninja for cmake recipes instead of make.

3. For how to pass data into recipes (leaf and non-leaf) from my external sequencer,
   env variables or command line args would be possible
   gnu make does it with env variables:
   make target FOO=bar

4. For passing data between recipes, here are some basic cases:
   a. A compiler tool may output its path to downstream recipes, for the downstream recipes to
   call the tool
   b. A build recipe may output some paths to downstream recipes, for the downstream recipes to
   use
   c. A build recipe may output some scripts to downstream recipes, for the downstream recipes to
   parameterize and call
   d. A test recipe may run a test and output the path to a performance data file.
   Then in my external sequencer, I could run Mussels again passing in the path to a graph
   render recipe.

WRT (3) & (4): Hm yes, this could be handy. At present I suspect you could use environment variables with the build scripts, but there's no way yet to have one recipe define a new variable and pass it downstream. It's a good idea that probably deserves a separate github issue.

Btw, love the idea of test recipes. Don't know why I didn't think of this use case.

5. Customizing the work folder could be useful for running multiple Mussels instances in parallel.
   Since for the parallel case, I think the work folder would need to be unique.
   Users may be able to get by with the same install folder. I am unclear if Mussels would do the
   right thing there. If anything, the user could specify separate install folders for this case.

That's a really good use case. Let's make a new Github Issue for customizing the work folder as well.

bburgin commented 4 years ago

Sounds good. How would I define my own list of stages? Or does Mussels just generate the stage list from the deps for a leaf recipe?

micahsnyder commented 4 years ago

As you suspected, Mussels generates the stage list based on the dependencies of each recipe. There's no way to manually define the build order.

Thanks for putting in the two other issues!