databricks / mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
https://docs.databricks.com/en/dev-tools/bundles/mlops-stacks.html
Apache License 2.0
416 stars 141 forks source link

Set repo as template repository #94

Open followingell opened 1 year ago

followingell commented 1 year ago

Hello Databricks,

Given the below:

We provide the default stack in this repo as a production-friendly starting point for MLOps.

Would it be possible/more-appropriate to set this repository as a template repository?

This would mean that users could easily create private "custom stacks" (e.g. for organizations) as per your Stack Customization Guide without the need to do a complex private fork procedures.

I look forward to hearing back!

jackroseman commented 1 year ago

Just providing my 2c as well.

My company needs a solution like this as well. I have a runway of ~10 projects that will use DABs / MLOps stacks. I maintain a custom MLOps stack implementation in Azure DevOps where my projects live in a monorepo. When this repo gets an update, here are my steps to update my projects:

  1. This repo gets updated, and because I cannot fork it in Azure DevOps, I maintain a clone of this repository in ADO, so I will pull in changes from the git remote to update the repo there.
  2. I have a fork of the ADO clone that is our custom stack so I have to pull in changes there as well and decide what should go in.
  3. I have to update the projects made from our custom stack, if necessary. I decide based on how much time it would take me to make the changes. There is no good way other than to merge projects together and decide which conflicting changes to keep. This may have unintended outcomes so in the long run I think updating a project with new stack features will be too much of a burden.

I concur that it would be nice to have the ability to easily maintain a custom stack that syncs with this repo AND update existing projects.

arpitjasa-db commented 10 months ago

Hi @followingell @jackroseman thanks for the feedback! We are looking into options that would allow us to maintain the current customizability of MLOps Stacks but also propagate changes to downstream generated Stacks. With a Github Template Repository, it becomes difficult to initialize the project with custom variables to the extent that we have it. For now, for forks we recommend pulling in the latest changes, and for already initialized projects, you can generate a project using the latest version of MLOps Stacks with the same parameters, add the code as a branch to your existing repo, and create a PR/perform a diff to see which of the latest changes you'd like to keep and which don't suit your purposes.

followingell commented 10 months ago

Hi @followingell @jackroseman thanks for the feedback! We are looking into options that would allow us to maintain the current customizability of MLOps Stacks but also propagate changes to downstream generated Stacks. With a Github Template Repository, it becomes difficult to initialize the project with custom variables to the extent that we have it. For now, for forks we recommend pulling in the latest changes, and for already initialized projects, you can generate a project using the latest version of MLOps Stacks with the same parameters, add the code as a branch to your existing repo, and create a PR/perform a diff to see which of the latest changes you'd like to keep and which don't suit your purposes.

@arpitjasa-db Thanks for the response and pointers.

Just sharing this blogpost which may help guide you in finding a solution 🙂 👍