Static Analysis Tests for Tag Validation

ellahathaway commented 2 weeks ago

Problem to solve

We need to implement static tests (based on image-builder manifest/README) for the following scenarios to ensure the correctness of our tag schema and adherence to our tagging rules:

Tag Schema Validation

Ensure the tag schema matches the description in supported-tags.md. Specifically:
- Verify that the Alpine floating tag references the correct version of Alpine.
- Confirm that appliance images use the correct tags, as they follow a different schema.

"Latest" Tag Rules

https://github.com/dotnet/dotnet-docker/pull/5852
Ensure the following rules are adhered to for the latest tags:
- There is only one latest tag per repository.
- The latest tag points to the expected version:
  - For main, it should point to the latest non-preview/release candidate major release.
  - For nightly, it should point to the latest unreleased version.

Additional implementation details

My initial thoughts are that we should use the tag tables/READMEs for this validation. Upon initial examination of this issue, parsing the manifest is difficult as it requires deserialization and replacement of variables. This is something that is already handled by the image builder, and rewriting this logic is tedious and prevents us from making changes to the manifest without large pieces of the test code becoming invalid. Since we already validate the readmes against the manifest, the readmes are another truthful place to validate tags that doesn't require immense parsing and duplicate image-builder logic. That said, another option for these tests, other than the manifest and the readme, is to make the image builder output some sort of artifact that contains the values we wish to validate.

mthalman commented 2 weeks ago

I'd like to avoid parsing the README files. I think we'd get better ROI by implementing a solution, like you suggested, that adds a command to Image Builder which rewrites the manifest to resolve all the variable values.

lbussell commented 2 weeks ago

@mthalman @ellahathaway I do like the idea of an imagebuilder command that resolves all the variable values in the manifest. However that still means we'd need to parse the manifest. But parsing the manifest from this repo would be useful for other reasons such as https://github.com/dotnet/dotnet-docker/issues/5337.

Another option could be to re-use the MCR tags metadata yaml that we already generate. The files contain most/all of the pertinent information for the static tests outlined above. That would prevent needing to read the manifest from the dotnet-docker repo. However we'd still be stuck parsing another new file format from this repo, just yaml this time instead of json.

This would require a new command to output the tags metadata files to the disk, and would also require an option to include undocumented tags in the output.

Example:

  - tags: [ 9.0.0-preview.7-bookworm-slim-amd64, 9.0-preview-bookworm-slim-amd64, 9.0.0-preview.7-bookworm-slim, 9.0-preview-bookworm-slim, 9.0.0-preview.7, 9.0-preview ]
    architecture: amd64
    os: linux
    osVersion: Debian 12
    dockerfile: https://github.com/dotnet/dotnet-docker/blob/e48d296ed821d3bf00cf90f26ddc236077a8c610/src/runtime/9.0/bookworm-slim/amd64/Dockerfile
    customSubTableTitle: .NET 9 Preview Tags

ellahathaway commented 2 weeks ago

However that still means we'd need to parse the manifest

I think that this is the major hurdle to overcome for implementing these tests. Reimplementing some of the image builder logic for parsing the manifest is a bit of a red flag to me - doing this will likely result in duplicate code and will make the tests unreliable if we change the structure of the manifest.

One potential option I see is to pull the image builder image so that the image builder commands are available to the tests. This would allow us to use image builder to parse the manifest and we wouldn't have to reply upon a new file format.

Another solution is what Logan mentioned with using the MCR tags metadata yaml. The problem with this solution is that it would require parsing a new file format. This is a potentially feasible solution if we think that the yaml format is unlikely to change.

mthalman commented 2 weeks ago

Reimplementing some of the image builder logic for parsing the manifest is a bit of a red flag to me

How extensive is this duplication? What kind of duplication is this consisting of? Naively, I would assume this is just enumerating the repos/images/tags in the manifest which doesn't seem like a big deal. Is it more than that?

ellahathaway commented 2 weeks ago

What kind of duplication is this consisting of?

It's primarily creating a new class representation of the manifest for deserialization. I wouldn't say this is the biggest deal, but my current implementation does look like a simplified version of https://github.com/dotnet/docker-tools/tree/main/src/Microsoft.DotNet.ImageBuilder/src/Models/Manifest. IMO, it would be easier/safer to not create a second version of the manifest model just for the tests.

ellahathaway commented 2 weeks ago

I'll also add, and I'm not sure exactly how feasible this would be because I haven't tested it yet, I came across https://github.com/testcontainers/testcontainers-dotnet which would potentially allow us to run the image-builder image in the tests. The package has an ExecAsync command that allows for a command to be run on a running container, so this could potentially work to run an image builder command for parsing the manifest.

mthalman commented 2 weeks ago

What kind of duplication is this consisting of?

It's primarily creating a new class representation of the manifest for deserialization. I wouldn't say this is the biggest deal, but my current implementation does look like a simplified version of https://github.com/dotnet/docker-tools/tree/main/src/Microsoft.DotNet.ImageBuilder/src/Models/Manifest. IMO, it would be easier/safer to not create a second version of the manifest model just for the tests.

Oh, if it's just the schema that's the concern, I don't see a great need to deserialize it into a strongly-typed object model. Just using System.Text.Json.Nodes.JsonObject or Newtonsoft.Json.JObject should be fine.

mthalman commented 2 weeks ago

I'll also add, and I'm not sure exactly how feasible this would be because I haven't tested it yet, I came across https://github.com/testcontainers/testcontainers-dotnet which would potentially allow us to run the image-builder image in the tests. The package has an ExecAsync command that allows for a command to be run on a running container, so this could potentially work to run an image builder command for parsing the manifest.

The tests already run other images today: https://github.com/dotnet/dotnet-docker/blob/d90d458deada9057d7889f76d58fc0a7194a0c06/tests/Microsoft.DotNet.Docker.Tests/DockerHelper.cs#L282-L292

I think it's totally feasible to run an image builder container from the tests with what we have today.

lbussell commented 2 weeks ago

Oh, if it's just the schema that's the concern, I don't see a great need to deserialize it into a strongly-typed object model. Just using System.Text.Json.Nodes.JsonObject or Newtonsoft.Json.JObject should be fine.

For reference we already do this (for a file other than the manifest) in ImageData.cs. Although I have a minor allergic reaction to non-strongly-typed code. Either direction should be fine though since we are already doing this DOM approach for some tests.

https://github.com/dotnet/dotnet-docker/blob/d90d458deada9057d7889f76d58fc0a7194a0c06/tests/Microsoft.DotNet.Docker.Tests/ImageData.cs#L172-L176

ellahathaway commented 1 week ago

Progress update:

I currently have a PR out for the latest tag tests + the manifest parsing logic.

I'm still working on implementing the tests for the tag schema validation. These tests are a bit more tedious to implement since there's a lot of rules and the rules sometimes vary based on the type of image and OS.

dotnet / dotnet-docker

Static Analysis Tests for Tag Validation #5838

Problem to solve

Additional implementation details