canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
97 stars 48 forks source link

Charms standardisation #794

Open DnPlas opened 6 months ago

DnPlas commented 6 months ago

Context

The MLOps team maintains a big number of charms that do not seem to follow any coding style and standard, and even manifest files are located in different directories in the src of the charm. We should propose a standard to follow in terms of coding style and repo organisation, same goes for CI, testing, and documentation.

The MLOps team should provide a standard for their charms, so it’s easier to maintain, automate, update, and even create charms. This story includes:

What needs to get done

  1. Identify the best practices for writing Charmed Kubeflow charms and make that a standard (could be with re-usable charm components)
  2. Find the best practices for event handling and make that a standard. This requires the team to define and agree on what is going to be done by each (in the past, there has been disagreement on some like update status, remove, etc.)
  3. Define the testing framework to use and make that a standard in all repositories.
  4. Define testing standards around integration testing.
  5. Make all README and CONTRIBUTING files have at least standard information that can be found in all charms.

Definition of Done

Charm repositories share the same structure, coding, and testing style.

syncronize-issues-to-jira[bot] commented 6 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5195.

This message was autogenerated

ca-scribner commented 6 months ago

Another thing we could standardize is how we name dev branches (the branches we use when we want to land more than one PR into a common feature or dev branch, test them together, and then do a final PR against main later). As is, dev branches don't have a standard and thus they don't have branch protection (anyone can push a commit directly to a dev branch, leading to potential mistakes). If we named them say dev/xxx, we could have branch protections (like with main and track/) on all repos

ca-scribner commented 6 months ago

our repo settings themselves are also not standardized. There's automation (see obs) that can make this programatically defined (this might be out of scope - its more charm repo standardization) edit: added as #811

ca-scribner commented 5 months ago

https://github.com/canonical/kubeflow-ci/issues/125 is also relevant here, where the goal is to standardize how CI generates logs

ca-scribner commented 5 months ago

Also relevant is https://github.com/canonical/kubeflow-ci/issues/127, which suggests we should have automation enforce the standardizations and conventions we decide on

DnPlas commented 5 months ago

In an attempt to also make our charms aware of the status and health of their services, we must agree on a correct approach to make sure our charms report back their actual status and act in the event of a failure. At least we have to discuss:

  1. Having pebble health checks in all our charms.
  2. Having other health checks for the workloads or k8s resources.
  3. What to do in the event of a failure.

https://github.com/canonical/admission-webhook-operator/issues/123 relates to and supports this story. That issue describes some errors in the pebble health checks, and asks questions about what should be the way to resolve and carry on with issues.

DnPlas commented 4 months ago

The logging story in our CI is also inconsistent. Though we have things like canonical/kubeflow-ci/actions/dump-charm-debug-artifacts@main in some of our workflows, in reality it's not everywhere and we had instances of that reusable workflow not providing correct logs or missing on important details.

Some of this has already been reported: