Open K0nkere opened 2 years ago
its need to add few environment variables
sudo nano /etc/environment
and add
MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
AWS_DEFAULT_REGION=ru-central1
also its need to add key_id and key to ~/.aws/credentials [default]
aws_access_key_id = <key_id>
aws_secret_access_key = <key>
restart server
mlflow ui
mlflow ui --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts
or as a server
mlflow server --backend-store-uri sqlite:///../mlops-project.db --default-artifact-root s3://kkr-mlops-zoomcamp/mlflow-artifacts/ --host 0.0.0.0:5001
or we can create .env located in the Pipfile folder
export MLFLOW_S3_ENDPOINT_URL=https://storage.yandexcloud.net
export AWS_DEFAULT_REGION=ru-central1
export AWS_ACCESS_KEY_ID=<key_id>
export AWS_SECRET_ACCESS_KEY=<key>
and variables will be automaticly setted on starting the pipenv environment
Downloading a model from registry
stage = 'Production'
staging_model = mlflow.pyfunc.load_model(model_uri=f'models:/{model_name}/{stage}/model')
#If it is need to save it locally
with open('models/rf-best-model-production.bin', 'wb')as f_out:
pickle.dump(staging_model, f_out)
#best model predictions
staging_model.predict(X_test)
Ways to get RUN_ID of a logged model 1) To take from experiment list
experiment = mlflow.set_experiment('Auction-car-prices-best-models')
best_model_run = mlflow_client.search_runs(
experiment_ids=experiment.experiment_id,
run_view_type=ViewType.ACTIVE_ONLY,
max_results=1,
order_by=["metrics.rmse_test ASC"]
)
RUN_ID = best_model_run[0].info.run_id
model_uri = "runs:/{:}/full-pipeline".format(RUN_ID)
2) To take while we are adding model to registry with model_uri
registered_model = mlflow.register_model(
model_uri=model_uri,
name = model_name
)
> registered_model.run_id and registered_model.current_stage
3) To take while promoting the model
promoted_model = mlflow_client.transition_model_version_stage(
name = model_name,
version = registered_model_version,
stage = to_stage,
archive_existing_versions=False
)
> registered_model.run_id and registered_model.current_stage
4) To take from registered models list
versions = mlflow_client.get_latest_versions(
model_name,
stages=['Production']
)
> version[num].version version[num].current_stage version[num].run_id version[num].name==model_name
As soon as I am taking registering the best models relying on metrics.rmse_test and test_dataset can be different from period to period its need to specify additional restriction for filtering as an example
query = f'parameters.test_dataset = "{test_dataset_period}"'
best_model_run = mlflow_client.search_runs(
experiment_ids=experiment.experiment_id,
run_view_type=ViewType.ACTIVE_ONLY,
max_results=1,
filter_string=query,
order_by=["metrics.rmse_test ASC"]
)
Creating MLFlow server with custom s3 bucket as Docker container
Dockerfile
building from the /project folder
docker build -t project-mlflow-server ./1-experiment-tracking/
and runningdocker run -it -v /mlflow-database:/mlflow/ -p 5001:5001 project-mlflow-server:latest
After that it will be accessible via 127.0.0.1:5001 or:5001