Open ellahathaway opened 2 weeks ago
I'd like to avoid parsing the README files. I think we'd get better ROI by implementing a solution, like you suggested, that adds a command to Image Builder which rewrites the manifest to resolve all the variable values.
@mthalman @ellahathaway I do like the idea of an imagebuilder command that resolves all the variable values in the manifest. However that still means we'd need to parse the manifest. But parsing the manifest from this repo would be useful for other reasons such as https://github.com/dotnet/dotnet-docker/issues/5337.
Another option could be to re-use the MCR tags metadata yaml that we already generate. The files contain most/all of the pertinent information for the static tests outlined above. That would prevent needing to read the manifest from the dotnet-docker repo. However we'd still be stuck parsing another new file format from this repo, just yaml this time instead of json.
This would require a new command to output the tags metadata files to the disk, and would also require an option to include undocumented tags in the output.
Example:
- tags: [ 9.0.0-preview.7-bookworm-slim-amd64, 9.0-preview-bookworm-slim-amd64, 9.0.0-preview.7-bookworm-slim, 9.0-preview-bookworm-slim, 9.0.0-preview.7, 9.0-preview ]
architecture: amd64
os: linux
osVersion: Debian 12
dockerfile: https://github.com/dotnet/dotnet-docker/blob/e48d296ed821d3bf00cf90f26ddc236077a8c610/src/runtime/9.0/bookworm-slim/amd64/Dockerfile
customSubTableTitle: .NET 9 Preview Tags
However that still means we'd need to parse the manifest
I think that this is the major hurdle to overcome for implementing these tests. Reimplementing some of the image builder logic for parsing the manifest is a bit of a red flag to me - doing this will likely result in duplicate code and will make the tests unreliable if we change the structure of the manifest.
One potential option I see is to pull the image builder image so that the image builder commands are available to the tests. This would allow us to use image builder to parse the manifest and we wouldn't have to reply upon a new file format.
Another solution is what Logan mentioned with using the MCR tags metadata yaml. The problem with this solution is that it would require parsing a new file format. This is a potentially feasible solution if we think that the yaml format is unlikely to change.
Reimplementing some of the image builder logic for parsing the manifest is a bit of a red flag to me
How extensive is this duplication? What kind of duplication is this consisting of? Naively, I would assume this is just enumerating the repos/images/tags in the manifest which doesn't seem like a big deal. Is it more than that?
What kind of duplication is this consisting of?
It's primarily creating a new class representation of the manifest for deserialization. I wouldn't say this is the biggest deal, but my current implementation does look like a simplified version of https://github.com/dotnet/docker-tools/tree/main/src/Microsoft.DotNet.ImageBuilder/src/Models/Manifest. IMO, it would be easier/safer to not create a second version of the manifest model just for the tests.
I'll also add, and I'm not sure exactly how feasible this would be because I haven't tested it yet, I came across https://github.com/testcontainers/testcontainers-dotnet which would potentially allow us to run the image-builder image in the tests. The package has an ExecAsync command that allows for a command to be run on a running container, so this could potentially work to run an image builder command for parsing the manifest.
What kind of duplication is this consisting of?
It's primarily creating a new class representation of the manifest for deserialization. I wouldn't say this is the biggest deal, but my current implementation does look like a simplified version of https://github.com/dotnet/docker-tools/tree/main/src/Microsoft.DotNet.ImageBuilder/src/Models/Manifest. IMO, it would be easier/safer to not create a second version of the manifest model just for the tests.
Oh, if it's just the schema that's the concern, I don't see a great need to deserialize it into a strongly-typed object model. Just using System.Text.Json.Nodes.JsonObject
or Newtonsoft.Json.JObject
should be fine.
I'll also add, and I'm not sure exactly how feasible this would be because I haven't tested it yet, I came across https://github.com/testcontainers/testcontainers-dotnet which would potentially allow us to run the image-builder image in the tests. The package has an ExecAsync command that allows for a command to be run on a running container, so this could potentially work to run an image builder command for parsing the manifest.
The tests already run other images today: https://github.com/dotnet/dotnet-docker/blob/d90d458deada9057d7889f76d58fc0a7194a0c06/tests/Microsoft.DotNet.Docker.Tests/DockerHelper.cs#L282-L292
I think it's totally feasible to run an image builder container from the tests with what we have today.
Oh, if it's just the schema that's the concern, I don't see a great need to deserialize it into a strongly-typed object model. Just using
System.Text.Json.Nodes.JsonObject
orNewtonsoft.Json.JObject
should be fine.
For reference we already do this (for a file other than the manifest) in ImageData.cs. Although I have a minor allergic reaction to non-strongly-typed code. Either direction should be fine though since we are already doing this DOM approach for some tests.
Progress update:
I currently have a PR out for the latest tag tests + the manifest parsing logic.
I'm still working on implementing the tests for the tag schema validation. These tests are a bit more tedious to implement since there's a lot of rules and the rules sometimes vary based on the type of image and OS.
Problem to solve
We need to implement static tests (based on image-builder manifest/README) for the following scenarios to ensure the correctness of our tag schema and adherence to our tagging rules:
Tag Schema Validation
"Latest" Tag Rules
Additional implementation details
My initial thoughts are that we should use the tag tables/READMEs for this validation. Upon initial examination of this issue, parsing the manifest is difficult as it requires deserialization and replacement of variables. This is something that is already handled by the image builder, and rewriting this logic is tedious and prevents us from making changes to the manifest without large pieces of the test code becoming invalid. Since we already validate the readmes against the manifest, the readmes are another truthful place to validate tags that doesn't require immense parsing and duplicate image-builder logic. That said, another option for these tests, other than the manifest and the readme, is to make the image builder output some sort of artifact that contains the values we wish to validate.