This repository serves as a template to kickstart your ML experiments on Code Ocean and MLflow.
(If you have already cloned the repo, you can this section and directly jump to the next section Work on Code Ocean
)
To use this template for your ML experiments on Code Ocean, follow the steps below:
Step 1: Use the Template on GitHub
Step 2: Clone Repository on Code Ocean
Note: You are NOT required to design test functions for the repos on ML experiments conducted on Code Ocean. However, ensure that your code is well-documented and follows best practices (see the Code/Dev/ML/Data-Ops guide).
For any questions/assistance, please contact Gurdeep.
1.1 postInstall Script: Located in the environment folder, the postInstall
script is where you specify any shareable packages you wish to use for your ML experiments. For the purposes of demo, in this capsule we install the package demoMLsourceCode repository from VPE's GitHub page. Please change this script based on your requirements.
1.2 Jupyter Notebook: Checkout the ml_experiment.ipnyb
notebook within the code folder, which demonstrates how to import the shareable package specified in the postInstall
script. Addditionally, it conducts basic ML experiments using the functionalities provided by the imported package. Please change this notebook based on your experiments. Also check out the notebooks external_ML_model.ipynb
that describes how to import an external model into our MLOps environment, and register_HF_mocel.ipynb
that describes how to register a Hugging Face model within our MLOps environment.
1.3 Dockerfile: Located in the environment folder, the Dockerfile
contains the recipe to load the base image (vscode + copilot + pylint) for your experiment. This is loaded by deafault for your capsule. Should your experiment require another base image, you can select from the available ones by clicking environment
.
You need to attach AWS Cloud Credentials (allows you to connect to AWS) and a Custom Key (path to S3 bucket that contains ml logs) as secrets in the environment of this capsule. This a one time process, and may not be repeated unless there is change in AWS policies or path to the S3 buckets. These credentials must be able to both Read and Write to a designated S3 bucket. Once you have the credentials and key, follow the steps below:
2.1 Go to the "Account" window and choose "User Secrets." Add "AWS Cloud Credentials" and "Custom Key" (Please check out the MLOps guide to know more about how to set up AWS Cloud Credentials and the DataOps guide on how to set up a Custom Key on Code Ocean or ask Gurdeep).
2.2 Navigate to the "Files" tab and select "environment".
2.3 Attach the appropriate AWS credentials and custom key to the capsule.
3.1 When you are ready, launch a Jupyter Notebook by choosing the icon on top-right.
3.2 Navigate to the notebook that contains your experiment. For the purposes of this demo, you can open code/ml_experiment.ipynb
.
3.3 Follow the steps described in the notebook to run the ML experiment.
3.4 Note that the beginning and end of the notebook contain brief code snippets for synchronizing your local copy of the MLflow database with the S3 bucket. The snippet at the start syncs the remote (S3 bucket) to your local copy (/tmp/mlflow/db
), while the one at the end performs the vice versa operation. Failing to execute these snippets at the correct points (i.e., the start and the end) may result in a loss of synchronization and files.