containers / ai-lab-recipes

Examples for building and running LLM services and applications locally with Podman
Apache License 2.0
98 stars 99 forks source link

Bootc e2e testing improvements #469

Open Gregory-Pereira opened 4 months ago

Gregory-Pereira commented 4 months ago
  1. Potentially embed the tests into the derived image
  2. Bake in the container auth file to the derived image for working with private images
  3. Figure out how to make the tmate session action die immediately when rest of the worfklow finishes, allowing for us to not keep the tests running longer, and add it as a seperate failure action maybe before terraform destroy
  4. Figure out how to not need to push and pull derived image from quay (see: https://github.com/containers/ai-lab-recipes/commit/7ae2f96e0009c3aab13e5bff801cd9a509bedda6)
  5. SSH key injection via user_data script

/cc @cgwalters @lmilbaum

lmilbaum commented 4 months ago

Consider using ECS (AWS Container Registry) instead of quay.io for shortening the feedback loop

Gregory-Pereira commented 4 months ago

Upgrading to podman v5 to add retry options on our podman run (bootc install). This depends on information from @cgwalters or @rhatdan, asking what happens in bootc install fails.

rhatdan commented 4 months ago

Not really sure what the question is here?

Gregory-Pereira commented 3 months ago

We are discussing improvements that came up in the process of implementing the E2E tests and corresponding testing playbooks. This specific question was regarding if it is worth upgrading podman to have access to the --retry and --retry-timeout flags during the podman run (bootc install) step of our provisioning playbook. I assumed that if bootc install fails, it would have persistent / damaging effect on the host system, and so adding the --retry option would be pointless.