cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.34k stars 2.97k forks source link

Publish helm charts #7093

Open wonko opened 11 months ago

wonko commented 11 months ago

Actions before raising this issue

Is your feature request related to a problem? Please describe.

The docs state that the install of CVAT through helm charts should start from the repository. Having published helm charts enables the easier use and distribution of the charts, and using them in automation tooling.

As an example, using the charts with terraform is currently near to impossible, as one would have to either include CVAT as a submodule (and submodules are a hassle to maintain...), pulling the complete CVAT repo each time an automation runs, just to check if the helm chart has an update.

Describe the solution you'd like

There are many ways to publish helm charts; easiest way might be to use Github pages as the project already uses Github and has Github actions already in place. The marketplace has plenty of tooling which deploys helm charts to GH pages.

Describe alternatives you've considered

No response

Additional context

No response

grzleadams commented 11 months ago

We deploy CVAT with Terraform via the chart, and I just use a null_resource to do the clone:

resource "null_resource" "git_clone" {
  triggers = {
    cvat_clone_trigger = var.cvat_ref
    force_clone_trigger = var.force_clone ? timestamp() : 0
  }
  provisioner "local-exec" {
    command = "if [ ! -d ${local.git_clone_path} ]; then mkdir -p ${local.git_clone_path} && git clone https://github.com/opencv/cvat.git ${local.git_clone_path}; fi && cd ${local.git_clone_path} && git checkout ${var.cvat_ref} && helm dependency update helm-chart"
  }
}

It's not perfect and can end up in a weird state if you're not careful, but it works fairly well. I agree that a published chart would be preferable, though.

wonko commented 11 months ago

In my book, it's wierd to have a pull of +100MB for a couple of kB of yaml-files to be used fetched, each time a pipeline is run (multiple times per day, when other parts of the infra plan are touched).

The effort to have this published to github pages is minimal, but you need access to the repo, can't fix it in a PR outside this repo. Helm chart-releaser was built exactly for this. A branch and a single github action is all that is needed.

(I've currently solved it through a git submodule for now. I'm not a big fan of null_resources to fix these kind of things, but I agree that there are way around this. Doesn't mean that we should not push to make it better...).

grzleadams commented 11 months ago

Agreed; a published chart is better, I was just trying to provide a workaround for anyone who comes along looking for how to use the chart with Terraform (I personally really don't like submodules).

FWIW the null_resource doesn't trigger every time, only when you either set force_clone to true (since timestamp() updates the resource on each run) or change cvat_ref (we pin to specific refs, so it was a natural fit for us). Every other time you run, nothing is changing in state, so it doesn't do the clone. The only problem with that is if you use ephemeral workers for CI, but it feels like using cache in that case makes more sense than re-cloning anyway.