anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.29k stars 578 forks source link

Detect wordpress #2658

Open witchcraze opened 9 months ago

witchcraze commented 9 months ago

What would you like to be added:

Detect wordpress

Why is this needed:

Syft does not detect wordpress.

$ syft wordpress | grep wordpress
 ✔ Loaded image                                                                                                                                                                                   wordpress:latest   ✔ Parsed image                                                                                                                            sha256:2fc2a7b0412945f6cc3d75420013cbb6d31d764128d06150b7e7deb61173ba2a
 ✔ Cataloged contents                                                                                                                             3d3f197740a201268ecbe340ccd9a95833995b7d2daa9d53664411eca16967a5
   ├── ✔ Packages                        [275 packages]
   ├── ✔ File digests                    [10,715 files]
   ├── ✔ File metadata                   [10,715 locations]
   └── ✔ Executables                     [1,284 executables]
Akismet Anti-spam: Spam Protection  5.3.1                           wordpress-plugin

Additional context:

docker scout can detect wordpress.

$ docker scout sbom --format list wordpress | grep wordpress
{"level":"info","msg":"Provenance obtained from attestation","time":"2024-02-21T14:42:04+09:00"}
{"level":"info","msg":"SBOM obtained from attestation, 382 packages indexed\n","time":"2024-02-21T14:42:09+09:00"}
{"level":"info","msg":"Pulling","time":"2024-02-21T14:42:09+09:00"}
{"level":"info","msg":"Pulled","time":"2024-02-21T14:42:10+09:00"}
  wordpress                  6.4.3                           generic

From https://github.com/docker/scout-cli/releases, docker scout seems to use syft. And in json, docker scout seems to check version.php file. Maybe they use binary cataloger like https://github.com/anchore/syft/pull/2445

$ docker scout sbom wordpress | jq '.artifacts[] | select(.name == "wordpress")'
{"level":"info","msg":"Provenance obtained from attestation","time":"2024-02-21T14:53:47+09:00"}
{"level":"info","msg":"SBOM obtained from attestation, 382 packages indexed\n","time":"2024-02-21T14:53:51+09:00"}
{"level":"info","msg":"Pulling","time":"2024-02-21T14:53:51+09:00"}
{"level":"info","msg":"Pulled","time":"2024-02-21T14:53:52+09:00"}
{
  "type": "generic",
  "name": "wordpress",
  "version": "6.4.3",
  "purl": "pkg:generic/wordpress@6.4.3",
  "author": "NOASSERTION",
  "locations": [
    {
      "path": "/usr/src/wordpress/wp-includes/version.php",
      "digest": "sha256:3f4af3e34785188118e119dc0189bea540851fe9a46947fcd631643df8bcad80",
      "diff_id": "sha256:426987b031bec9147d4a532539e731aa9dc2928a52a36a7d022d8123a289943c"
    }
  ]
}
tgerla commented 9 months ago

Hi @witchcraze, thanks for the report. I did a little digging and we do have a binary cataloger for Wordpress: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/binary/default_classifiers.go#L407 -- it looks for wp-cli, which apparently isn't on this wordpress image. It might be the case that we need to expand our classifier for Wordpress bits.

Interestingly enough, I tried your docker command but I don't get the same results from Scout:

tgerla@Timothys-MacBook-Pro-2 wp % docker scout sbom --format list wordpress | grep wordpress
{"level":"info","msg":"SBOM of image already cached, 381 packages indexed\n","time":"2024-02-21T08:37:00-05:00"}
witchcraze commented 9 months ago

Thank you for your confirmation. Let me report my environment (WSL2).

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

$ docker version
Client: Docker Engine - Community
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:09 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435
  Built:            Tue Feb  6 21:13:09 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ docker scout version

      ⢀⢀⢀             ⣀⣀⡤⣔⢖⣖⢽⢝
   ⡠⡢⡣⡣⡣⡣⡣⡣⡢⡀    ⢀⣠⢴⡲⣫⡺⣜⢞⢮⡳⡵⡹⡅
  ⡜⡜⡜⡜⡜⡜⠜⠈⠈        ⠁⠙⠮⣺⡪⡯⣺⡪⡯⣺
 ⢘⢜⢜⢜⢜⠜               ⠈⠪⡳⡵⣹⡪⠇
 ⠨⡪⡪⡪⠂    ⢀⡤⣖⢽⡹⣝⡝⣖⢤⡀    ⠘⢝⢮⡚       _____                 _
  ⠱⡱⠁    ⡴⡫⣞⢮⡳⣝⢮⡺⣪⡳⣝⢦    ⠘⡵⠁      / ____| Docker        | |
   ⠁    ⣸⢝⣕⢗⡵⣝⢮⡳⣝⢮⡺⣪⡳⣣    ⠁      | (___   ___ ___  _   _| |_
        ⣗⣝⢮⡳⣝⢮⡳⣝⢮⡳⣝⢮⢮⡳            \___ \ / __/ _ \| | | | __|
   ⢀    ⢱⡳⡵⣹⡪⡳⣝⢮⡳⣝⢮⡳⡣⡏    ⡀       ____) | (_| (_) | |_| | |_
  ⢀⢾⠄    ⠫⣞⢮⡺⣝⢮⡳⣝⢮⡳⣝⠝    ⢠⢣⢂     |_____/ \___\___/ \__,_|\__|
  ⡼⣕⢗⡄    ⠈⠓⠝⢮⡳⣝⠮⠳⠙     ⢠⢢⢣⢣
 ⢰⡫⡮⡳⣝⢦⡀              ⢀⢔⢕⢕⢕⢕⠅
 ⡯⣎⢯⡺⣪⡳⣝⢖⣄⣀        ⡀⡠⡢⡣⡣⡣⡣⡣⡃
⢸⢝⢮⡳⣝⢮⡺⣪⡳⠕⠗⠉⠁    ⠘⠜⡜⡜⡜⡜⡜⡜⠜⠈
⡯⡳⠳⠝⠊⠓⠉             ⠈⠈⠈⠈

version: v1.5.0 (go1.21.6 - linux/amd64)
git commit: 5661a7fce57851c49627a79c9d181e87833df7e8
tgerla commented 9 months ago

Thanks @witchcraze! I upgraded Docker and now I'm on Scout 1.4.1 and I get the same results as you. I will see if I can figure out what Scout is doing differently from stock Syft.

kzantow commented 9 months ago

I had a look around the wordpress:6.4.3 image, and (excluding some irrelevant matches) the only things I see present are:

# grep --exclude-dir=proc --exclude-dir=sys --exclude=*.svg --exclude=*.js -r '6\.4\.3' /
grep: /usr/lib/python3.11/__pycache__/ssl.cpython-311.pyc: binary file matches
/usr/lib/python3.11/ssl.py:    """Matching according to RFC 6125, section 6.4.3
/usr/src/wordpress/wp-includes/version.php:$wp_version = '6.4.3';
/usr/src/wordpress/wp-admin/about.php:                      '6.4.3',
/usr/src/wordpress/wp-admin/about.php:                          sanitize_title( '6.4.3' )

It sure looks like using the wp-includes/version.php would be the just about the only reasonably correct way to identify this, as @witchcraze noted. I have an idea how I might like this to work generally for things like this in .php, .rb, and similar source files, but probably not by directly using the binary cataloger, which is really intended to identify individual binary files (hence the name).

neufeind commented 8 months ago

In addition to that I wonder how we could also add detecting wordpress-plugins (and versions) once we identified a wordpress. Once we found a path containing WordPress itself we could maybe use a dependent check from there to check for plugins. Or if you prefer that more maybe call a found wp-cli tool with that given path and get a "wp plugin list"-listing from there? I tend to detecting plugins directly (but only checking for them once we detected a WordPress-installation).

willmurphyscode commented 3 days ago

@neufeind Syft detects WordPress plugins with their versions since https://github.com/anchore/syft/pull/2218. Please open a new issue if that functionality is giving you any trouble! This issue is about detecting the installation of WordPress itself.

willmurphyscode commented 3 days ago

Here's the concrete proposal I think we should build:

  1. Add a new cataloger that is meant to detect the wordpress server code itself.
  2. The new cataloger should look for a path like **/wordpress/wp-includes/version.php and parse out the value of the $wp_version variable from a line like $wp_version = '6.4.3';.

This should not be part of the binary cataloger, though it will be pretty similar in structure.

One remaining open question is: what type of package should it emit? There aren't a log of great candidates at https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst - composer is used for PHP composer packages, but we don't have PHP composer packages in this case, we just have a pile of PHP files that we believe represents an installation of the wordpress server process. We use wordpress-plugin for WordPress plugins, but WordPress itself isn't really a WordPress plugin.

I'm adding the needs-discussion label to discuss as a team what package type this new cataloger should emit. Once that's decided this is probably ready to work on.

westonsteimel commented 3 days ago

I think the package url type would just be generic and the syft package type probably just binary as well with a language type set to php for now?