delimitrou / DeathStarBench

Open-source benchmark suite for cloud microservices
Apache License 2.0
752 stars 418 forks source link

Wage war on "latest" tag? #326

Open matte21 opened 8 months ago

matte21 commented 8 months ago

DSB primary purpose is research. In research, reproducibility of results is important. Thus, DSB should make it easy to run reproducible experiments with it.

Yet, the latest tag is widely used in the repo's references to container images. For example, it's used both as a tag to base images in Dockerfiles, and in the YAML manifests for K8s (I linked only two occurrences here, but there are many more).

This makes it hard to write deterministic experiments. Consider the following scenario:

Time 0: A researcher runs an experiment with DSB on a brand new K8s cluster. The experiment uses container C with latest tag (for example, currently there are init containers that use alpine:latest). The latest tag for C corresponds to v1, so the experiment runs with C at v1. The researcher submits a paper with the experiment's results. Time 1: The developers of C release v2, so the latest tag is moved to point to the container image corresponding to v2 Time 2: A reviewer tries to evaluate the artifacts. It creates a new K8s cluster, and tries to re-run the same experiment. The experiments should run with C at v1, but they will run with C at v2 instead.

This is just one scenario where latest is harmful, but there are more (not reported here for brevity).

I think it'd be best if for a given git tag of this repo, all container image tags were explicit versions rather than latest.

Note: this non-reproducibility issue goes beyond container tags. For example, see how this init container does a git clone: https://github.com/delimitrou/DeathStarBench/blob/aeb4860b7c29491ebc4d344b103d0c0b97684354/socialNetwork/helm-chart/socialnetwork/charts/media-frontend/values.yaml#L28 The cloned URL doesn't contain any tag, so what gets cloned is the head of the "master" branch, which isn't stable.

What do you think? Is there agreement that we should do this?

dev-lew commented 1 month ago

I think you bring up an important point. And I agree that all versioned releases should feature pinned image versions. I had thought that using release 0.4.1 sources would be enough, but I didn't notice almost all references to a docker image in the OpenShift manifest uses latest (implicitly or explicitly).

To fix this, I would use the 0.4.1 image for the benchmark, as well as pin the versions of memcached, redis, and rabbitmq to the latest major version releases.