databricks / mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
https://docs.databricks.com/en/dev-tools/bundles/mlops-stacks.html
Apache License 2.0
429 stars 143 forks source link

Update MLOps stacks project layout #49

Closed mingyu89 closed 1 year ago

mingyu89 commented 1 year ago

Signed-off-by: Mingyu Li mingyu.li@databricks.com

stacks layout design doc https://docs.google.com/document/d/1BTFOzxiVzCJ2uKN0f9N3id85pL32Gof6c0cD1ao7fNo/edit#bookmark=id.gklrrat3qlf8

It's a big change. Let's focus on code correctness for now and fix/improve docs later.

Screen Shot 2023-02-21 at 5 06 37 PM

Decisions

  1. Create one more variable for the root folder.
  2. We'll just ask user what's the root folder name. Add to instructions that if user intends to use polyrepo, they should keep the root folder name same as project name.
  3. Put terraform folder inside the project
  4. Don't create a separate folder for projects, just add them to the root folder.

Layout

non-project related folders

project related folders


Test

All tests are passing

Sanity check of generating project

run Train notebook

Screen Shot 2023-02-22 at 11 41 36 AM

run batch inference

Screen Shot 2023-02-22 at 11 41 08 AM Screen Shot 2023-02-22 at 11 43 35 AM Screen Shot 2023-02-22 at 11 43 41 AM
mingyu89 commented 1 year ago

Not ready for review

mingyu89 commented 1 year ago

Ready for review

mingyu89 commented 1 year ago

Ready for review again:) @vladimirk-db @zhe-db @arpitjasa-db

mingyu89 commented 1 year ago

Tested running training notebook and batch inference in databricks workspace.

mingyu89 commented 1 year ago

@arpitjasa-db I added you to my repo. It's the refactor2 branch. https://github.com/mingyu89/test-repo1/tree/refactor2 Please note that the root folder name is not visible because it's also the git root folder. cc @zhe-db @vladimirk-db

mingyu89 commented 1 year ago

@vladimirk-db I feel it better too to have one less layer for users. I made the change and tested by generating project and run the training and inference on databricks.