Bump mlflow from 2.14.3 to 2.15.0

Bumps mlflow from 2.14.3 to 2.15.0.

Release notes

MLflow 2.15.0 includes many major features and improvements:

Major features:

🦙 LlamaIndex Flavor - MLflow now offers a native integration with LlamaIndex, one of the most popular libraries for building GenAI apps centered around custom data. This integration allows you to log LlamaIndex indices within MLflow, allowing for the loading and deployment of your indexed data for inference tasks with different engine types. MLflow also provides comprehensive tracing support for LlamaIndex operations, offering unprecedented transparency into complex queries. Check out the MLflow LlamaIndex documentation to get started! (#12633, @michael-berk, @B-Step62)

🔍 OpenAI Tracing - We've enhanced our OpenAI integration with a new tracing feature that works seamlessly with MLflow OpenAI autologging. You can now enable tracing of their OpenAI API usage with a single mlflow.openai.autolog() call, thereby MLflow will automatically log valuable metadata such as token usage and a history of your interactions, providing deeper insights into your OpenAI-powered applications. To start exploring this new capability, please check out the tracing documentation! (#12267, @gabrielfu)

✅ Enhanced Model Deployment Validation - To improve the reliability of model deployments, MLflow has added a new method to validate your model before deploying it to an inference endpoint. This feature helps to eliminate typical errors in input and output handling, streamlining the process of model deployment and increasing confidence in your deployed models. By catching potential issues early, you can ensure a smoother transition from development to production. (#12710, @serena-ruan)

📊 Custom Metrics Definition Recording for Eval - We've strengthened the flexibility of defining custom metrics for model evaluation by automatically logging and versioning metrics definitions, including models used as judges and prompt templates. With this new capability, you can ensure reproducibility of evaluations across different runs and easily reuse evaluation setups for consistency, facilitating more meaningful comparisons between different models or versions. (#12487, #12509, @xq-yin)

🔐 Databricks SDK Integration - MLflow's interaction with Databricks endpoints has been fully migrated to use the Databricks SDK. This change brings more robust and reliable connections between MLflow and Databricks, and access to the latest Databricks features and capabilities. We mark the legacy databricks-cli support as deprecated and will remove in the future release. (#12313, @WeichenXu123)

💥 Spark VectorUDT Support - MLflow's Model Signature framework now supports Spark Vector UDT (User Defined Type), enabling logging and deployment of models using Spark VectorUDT with robust type validation. (#12758, @WeichenXu123)

Other Notable Changes

Features:

[Tracking] Add parent_id as a parameter to the start_run fluent API for alternative control flows (#12721, @Flametaa)

[Tracking] Add U2M authentication support for connecting to Databricks from MLflow (#12713, @WeichenXu123)

[Tracking] Support deleting remote artifacts with mlflow gc (#12451, @M4nouel)

[Tracing] Traces can now be deleted conveniently via UI from the Traces tab in the experiments page (#12641, @daniellok-db)

[Models] Introduce additional parameters for the ChatModel interface for GenAI flavors (#12612, @WeichenXu123)

[Models] [Transformers] Support input images encoded with b64.encodebytes (#12087, @MadhuM02)

[Models Registry] Add support for AWS KMS encryption for the Unity Catalog model registry integration (#12495, @artjen)

[Models] Fix MLflow Dataset hashing logic for Pandas dataframe to use iloc for accessing rows (#12410, @julcsii)

[Models Registry] Support presigned urls without headers for artifact location (#12349, @artjen)

[UI] The experiments page in the MLflow UI has an updated look, and comes with some performance optimizations for line charts (#12641, @hubertzub-db)

[UI] Line charts can now be configured to ignore outliers in the data (#12641, @daniellok-db)

[UI] Creating compatibility with Kubeflow Dashboard UI (#12663, @cgilviadee)

[UI] Add a new section to the artifact page in the Tracking UI, which shows code snippet to validate model input format before deployment (#12729, @serena-ruan)

Bug fixes:

[Tracking] Fix the model construction bug in MLflow SHAP evaluation for scikit-learn model (#12599, @serena-ruan)

[Tracking] File store get_experiment_by_name returns all stage experiments (#12788, @serena-ruan)

[Tracking] Fix Langchain callback injection logic for async/streaming request (#12773, @B-Step62)

[Tracing] [OpenAI] Fix stream tracing for OpenAI to record the correct chunk structure (#12629, @BenWilson2)

[Tracing] [LangChain] Fix LangChain tracing bug for .batch call due to thread unsafety (#12701, @B-Step62)

[Tracing] [LangChain] Fix nested trace issue in LangChain tracing. (#12705, @B-Step62)

[Tracing] Prevent intervention between MLflow Tracing and other OpenTelemetry-based libraries (#12457, @B-Step62)

[Models] Fix log_model issue in MLflow >= 2.13 that causes databricks DLT py4j service crashing (#12514, @WeichenXu123)

[Models] [Transformers] Fix batch inference issue for Transformers Whisper model (#12575, @B-Step62)

[Models] [LangChain] Fix the empty generator issue in predict_stream for AgentExecutor and other non-Runnable chains (#12518, @B-Step62)

[Scoring] Fix Spark UDF permission denied issue in Databricks runtime (#12774, @WeichenXu123)

Documentation updates:

Add documentation on authentication for Databricks UC Model Registry (#12552, @WeichenXu123)

Adding model-from-code documentation for LangChain and Pyfunc (#12325, #12336, @sunishsheth2009)

Add FAQ entry for viewing trace exceptions (#12309, @BenWilson2)

Add note about fork vs spawn method when using multiprocessing for parallel runs (#12337, @B-Step62)

Add example usage of extract_fields for mlflow.search_traces (#12319, @xq-yin)

Replace GPT-3.5-turbo with GPT-4o-mini (#12740, #12746, @Acksout)

... (truncated)

Changelog

Sourced from mlflow's changelog.

2.15.0 (2024-07-29)

We are excited to announce the release candidate for MLflow 2.15.0. This release includes many major features and improvements!

Major features:

LlamaIndex Flavor🦙 - MLflow now offers a native integration with LlamaIndex, one of the most popular libraries for building GenAI apps centered around custom data. This integration allows you to log LlamaIndex indices within MLflow, allowing for the loading and deployment of your indexed data for inference tasks with different engine types. MLflow also provides comprehensive tracing support for LlamaIndex operations, offering unprecedented transparency into complex queries. Check out the MLflow LlamaIndex documentation to get started! (#12633, @michael-berk, @B-Step62)

OpenAI Tracing🔍 - We've enhanced our OpenAI integration with a new tracing feature that works seamlessly with MLflow OpenAI autologging. You can now enable tracing of their OpenAI API usage with a single mlflow.openai.autolog() call, thereby MLflow will automatically log valuable metadata such as token usage and a history of your interactions, providing deeper insights into your OpenAI-powered applications. To start exploring this new capability, please check out the tracing documentation! (#12267, @gabrielfu)

Enhanced Model Deployment with New Validation Feature✅ - To improve the reliability of model deployments, MLflow has added a new method to validate your model before deploying it to an inference endpoint. This feature helps to eliminate typical errors in input and output handling, streamlining the process of model deployment and increasing confidence in your deployed models. By catching potential issues early, you can ensure a smoother transition from development to production. (#12710, @serena-ruan)

Custom Metrics Definition Recording for Evaluations📊 - We've strengthened the flexibility of defining custom metrics for model evaluation by automatically logging and versioning metrics definitions, including models used as judges and prompt templates. With this new capability, you can ensure reproducibility of evaluations across different runs and easily reuse evaluation setups for consistency, facilitating more meaningful comparisons between different models or versions. (#12487, #12509, @xq-yin)

Databricks SDK Integration🔐 - MLflow's interaction with Databricks endpoints has been fully migrated to use the Databricks SDK. This change brings more robust and reliable connections between MLflow and Databricks, and access to the latest Databricks features and capabilities. We mark the legacy databricks-cli support as deprecated and will remove in the future release. (#12313, @WeichenXu123)

Spark VectorUDT Support💥 - MLflow's Model Signature framework now supports Spark Vector UDT (User Defined Type), enabling logging and deployment of models using Spark VectorUDT with robust type validation. (#12758, @WeichenXu123)

Other Notable Changes

Features:

[Tracking] Add parent_id as a parameter to the start_run fluent API for alternative control flows (#12721, @Flametaa)

[Tracking] Add U2M authentication support for connecting to Databricks from MLflow (#12713, @WeichenXu123)

[Tracking] Support deleting remote artifacts with mlflow gc (#12451, @M4nouel)

[Tracing] Traces can now be deleted conveniently via UI from the Traces tab in the experiments page (#12641, @daniellok-db)

[Models] Introduce additional parameters for the ChatModel interface for GenAI flavors (#12612, @WeichenXu123)

[Models] [Transformers] Support input images encoded with b64.encodebytes (#12087, @MadhuM02)

[Models Registry] Add support for AWS KMS encryption for the Unity Catalog model registry integration (#12495, @artjen)

[Models] Fix MLflow Dataset hashing logic for Pandas dataframe to use iloc for accessing rows (#12410, @julcsii)

[Models Registry] Support presigned urls without headers for artifact location (#12349, @artjen)

[UI] The experiments page in the MLflow UI has an updated look, and comes with some performance optimizations for line charts (#12641, @hubertzub-db)

[UI] Line charts can now be configured to ignore outliers in the data (#12641, @daniellok-db)

[UI] Creating compatibility with Kubeflow Dashboard UI (#12663, @cgilviadee)

[UI] Add a new section to the artifact page in the Tracking UI, which shows code snippet to validate model input format before deployment (#12729, @serena-ruan)

Bug fixes:

[Tracking] Fix the model construction bug in MLflow SHAP evaluation for scikit-learn model (#12599, @serena-ruan)

[Tracking] File store get_experiment_by_name returns all stage experiments (#12788, @serena-ruan)

[Tracking] Fix Langchain callback injection logic for async/streaming request (#12773, @B-Step62)

[Tracing] [OpenAI] Fix stream tracing for OpenAI to record the correct chunk structure (#12629, @BenWilson2)

[Tracing] [LangChain] Fix LangChain tracing bug for .batch call due to thread unsafety (#12701, @B-Step62)

[Tracing] [LangChain] Fix nested trace issue in LangChain tracing. (#12705, @B-Step62)

[Tracing] Prevent intervention between MLflow Tracing and other OpenTelemetry-based libraries (#12457, @B-Step62)

[Models] Fix log_model issue in MLflow >= 2.13 that causes databricks DLT py4j service crashing (#12514, @WeichenXu123)

[Models] [Transformers] Fix batch inference issue for Transformers Whisper model (#12575, @B-Step62)

[Models] [LangChain] Fix the empty generator issue in predict_stream for AgentExecutor and other non-Runnable chains (#12518, @B-Step62)

[Scoring] Fix Spark UDF permission denied issue in Databricks runtime (#12774, @WeichenXu123)

... (truncated)

Commits

be354dd Run python3 dev/update_mlflow_versions.py pre-release ... (#12811)
2caae01 Pin incremental to fix error in circleci (#12792)
b430290 Fix Spark UDF permission denied issue in Databricks runtime (#12774)
a923c0d Adjust pyfunc.load_model to check envs right before calling functions (issu...
5f90288 Fix duplicate tests for langchain and langchain_community (#12804)
23abba8 Document multipart upload for proxied artifact access (#12786)
48677e7 Add url logging when finishing MLflow run (#12708)
9a36b18 Fix OpenAI autolog trace response format issue (#12797)
ba57376 File store get_experiment_by_name returns all stage experiments (#12788)
6449271 Llama index examples (#12787)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

G-Research / fasttrackml

Bump mlflow from 2.14.3 to 2.15.0 #1376

Major features:

Other Notable Changes

Features:

Bug fixes:

Documentation updates:

2.15.0 (2024-07-29)

Major features:

Other Notable Changes