chainguard-dev / bincapz

detect malicious program behaviors
Apache License 2.0
380 stars 24 forks source link

Add support for Ruby Gem files #205

Closed egibs closed 2 months ago

egibs commented 2 months ago

Closes: https://github.com/chainguard-dev/bincapz/issues/202

This PR adds support for .gem files which are tricky to work with as noted in the issue. If a .gem file is scanned, it will be extracted and then its nested archives will be recursively extracted until only the original files are present in the temporary directory.

I added two more extraction functions to help with .gz archives (checksums.yaml.gz and metadata.gz necessitated this) and nested archives as well. Right now, the latter will only be called when processing .gem archives but it may make sense to allow nested support for all archive types sooner rather than later. TBD, though.

Examples using the file from the issue:

❯ go run . ~/Downloads/cocoapods-fixbugs-plugin-0.1.0.gem
/var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/cocoapods-fixbugs-plugin-0.1.0.gem2435123227/bin/console [✅ LOW]
----------------------------------------------------------------------
RISK  KEY               DESCRIPTION                     EVIDENCE
----------------------------------------------------------------------
LOW   ref/path/usr/bin  path reference within /usr/bin  /usr/bin/env
LOW   ref/words/plugin  references a 'plugin'           plugin
----------------------------------------------------------------------

/var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/cocoapods-fixbugs-plugin-0.1.0.gem2435123227/bin/setup [⚠️ MEDIUM]
---------------------------------------------
RISK  KEY         DESCRIPTION     EVIDENCE
---------------------------------------------
MED   shell/exec  executes shell  /bin/bash
---------------------------------------------

/var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/cocoapods-fixbugs-plugin-0.1.0.gem2435123227/lib/cocoapods/fixbugs/plugin/version.rb [✅ LOW]
--------------------------------------------------------------
RISK  KEY               DESCRIPTION            EVIDENCE
--------------------------------------------------------------
LOW   ref/words/plugin  references a 'plugin'  module Plugin
--------------------------------------------------------------

/var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/cocoapods-fixbugs-plugin-0.1.0.gem2435123227/lib/cocoapods/fixbugs/plugin.rb [✅ LOW]
-------------------------------------------------------------
RISK  KEY                 DESCRIPTION              EVIDENCE
-------------------------------------------------------------
LOW   fs/symlink/resolve  resolves symbolic links  realpath
LOW   ref/words/plugin    references a 'plugin'    plugin
-------------------------------------------------------------

❯ ls /var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/cocoapods-fixbugs-plugin-0.1.0.gem2435123227
Permissions Size User  Date Modified Name
drwxr-xr-x@    - egibs  8 May 21:15  bin
.rw-r--r--@  418 egibs  8 May 21:15  checksums.yaml
.rw-r--r--@  991 egibs  8 May 21:15  cocoapods-fixbugs-plugin.gemspec
.rw-r--r--@  14k egibs  8 May 21:15  data.tar
.rw-r--r--@  109 egibs  8 May 21:15  Gemfile
drwxr-xr-x@    - egibs  8 May 21:15  lib
.rw-r--r--@ 1.1k egibs  8 May 21:15  LICENSE.txt
.rw-r--r--@ 2.2k egibs  8 May 21:15  metadata
.rw-r--r--@   28 egibs  8 May 21:15  Rakefile
.rw-r--r--@ 1.4k egibs  8 May 21:15  README.md

I validated that the extraction is working the same as a manual extraction would:

❯ tar -xvf data.tar.gz
x .travis.yml
x Gemfile
x LICENSE.txt
x README.md
x Rakefile
x bin/console
x bin/setup
x cocoapods-fixbugs-plugin.gemspec
x lib/cocoapods/fixbugs/plugin.rb
x lib/cocoapods/fixbugs/plugin/version.rb

I thought that there was an archived directory but it instead extracts all of the files.

egibs commented 2 months ago

I’m going to spend more time on this tomorrow; I want to preserve the directory structure when extracting data.tar.gz and revisit the extraction addition to extractTar.

egibs commented 2 months ago

We should get some tests in place to make sure we don't break this in the future, but first we'll want to work on making output predictable for extracted files.

Definitely -- I was just thinking about addressing testing gaps re: all of the recently-added features 😅