K0nkere / MLOps_Car-prices-project

Covers the most important stages of ML system / Auction car prices prediction model for MLOps-zoomcamp
0 stars 0 forks source link

1 Experiment tracking #2

Open K0nkere opened 2 years ago

K0nkere commented 2 years ago

Creating MLFlow server with custom s3 bucket as Docker container

Dockerfile

FROM python:3.9.7-slim

WORKDIR /mlflow/

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
  rm requirements.txt

EXPOSE 5001

ENV MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
ENV AWS_DEFAULT_REGION=ru-central1
ENV AWS_ACCESS_KEY_ID=<key_id>
ENV AWS_SECRET_ACCESS_KEY=<key>

ENV BACKEND_URI sqlite:////mlflow/mlops-project.db
ENV ARTIFACT_ROOT s3://kkr-mlops-zoomcamp/mlflow-artifacts/

# ENTRYPOINT ["bash"]
CMD mlflow server --backend-store-uri ${BACKEND_URI} --default-artifact-root ${ARTIFACT_ROOT} --host 0.0.0.0 --port 5001

building from the /project folder docker build -t project-mlflow-server ./1-experiment-tracking/ and running docker run -it -v /mlflow-database:/mlflow/ -p 5001:5001 project-mlflow-server:latest

After that it will be accessible via 127.0.0.1:5001 or :5001

K0nkere commented 2 years ago

For the mlflow to save artifact in a custom bucket

its need to add few environment variables sudo nano /etc/environment and add

MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
AWS_DEFAULT_REGION=ru-central1

also its need to add key_id and key to ~/.aws/credentials [default]

aws_access_key_id = <key_id>
aws_secret_access_key = <key>

restart server

Running MLFlow service

mlflow ui mlflow ui --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts or as a server mlflow server --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts/ --host 0.0.0.0:5001

or we can create .env located in the Pipfile folder

export MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
export AWS_DEFAULT_REGION=ru-central1
export AWS_ACCESS_KEY_ID=<key_id>
export AWS_SECRET_ACCESS_KEY=<key>

and variables will be automaticly setted on starting the pipenv environment

K0nkere commented 2 years ago

Downloading a model from registry

stage = 'Production'
staging_model = mlflow.pyfunc.load_model(model_uri=f'models:/{model_name}/{stage}/model')

#If it is need to save it locally  
with open('models/rf-best-model-production.bin', 'wb')as f_out:
          pickle.dump(staging_model, f_out)

#best model predictions
staging_model.predict(X_test)
K0nkere commented 2 years ago

Ways to get RUN_ID of a logged model 1) To take from experiment list

experiment = mlflow.set_experiment('Auction-car-prices-best-models')
best_model_run = mlflow_client.search_runs(
        experiment_ids=experiment.experiment_id,
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=1,
        order_by=["metrics.rmse_test ASC"]
        )
    RUN_ID = best_model_run[0].info.run_id
    model_uri = "runs:/{:}/full-pipeline".format(RUN_ID)

2) To take while we are adding model to registry with model_uri

registered_model = mlflow.register_model(
            model_uri=model_uri,
            name = model_name
        )
> registered_model.run_id and registered_model.current_stage

3) To take while promoting the model

promoted_model = mlflow_client.transition_model_version_stage(
                                name = model_name,
                                version = registered_model_version,
                                stage = to_stage,
                                archive_existing_versions=False
                                )
> registered_model.run_id and registered_model.current_stage

4) To take from registered models list

versions = mlflow_client.get_latest_versions(
        model_name,
        stages=['Production']
        )
> version[num].version version[num].current_stage version[num].run_id version[num].name==model_name
K0nkere commented 2 years ago

Filtering list of experiment

As soon as I am taking registering the best models relying on metrics.rmse_test and test_dataset can be different from period to period its need to specify additional restriction for filtering as an example

query = f'parameters.test_dataset = "{test_dataset_period}"'
    best_model_run = mlflow_client.search_runs(
        experiment_ids=experiment.experiment_id,
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=1,
        filter_string=query,
        order_by=["metrics.rmse_test ASC"]   
        )