Autogluon-cloud aims to provide user tools to train, fine-tune and deploy AutoGluon backed models on the cloud. With just a few lines of codes, users could train a model and perform inference on the cloud without worrying about MLOps details such as resource management
Apache License 2.0
18
stars
12
forks
source link
Fix batch transform issue for tabular predictor with multiple partitions #138
This PR fixes the issue where batch transform jobs fail due to column misalignment when the input CSV file is partitioned into multiple records. The problem arises because headers from different partitions are not handled properly, leading to misaligned columns and prediction failures during inference.
Changes:
Added logic to align columns across partitions by ensuring headers are managed correctly.
Introduced _read_with_fallback and _align_columns helper functions to handle column alignment.
Updated transform_fn in tabular_serve.py to use these helper functions.
Limitations:
This fix currently only works for the tabular predictor. Support for multimodal and timeseries predictors depends on the implementation of original_features, which can be tracked in issue #4477.
Steps to Reproduce:
The following script can be used to reproduce the issue:
from autogluon.cloud import TabularCloudPredictor
import pandas as pd
# Load datasets
train_data = pd.read_csv("https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv")
test_data = pd.read_csv("https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv")
test_data.drop(columns=['class'], inplace=True)
# Cloud Predictor Arguments
predictor_init_args = {"label": "class"}
predictor_fit_args = {"train_data": train_data, "time_limit": 60}
# Initialize Cloud Predictor and Fit
cloud_predictor = TabularCloudPredictor(cloud_output_path='tonyhu-autogluon')
cloud_predictor.fit(predictor_init_args=predictor_init_args, predictor_fit_args=predictor_fit_args)
# Batch Inference with small max_payload to force multiple partitions
result = cloud_predictor.predict(test_data, backend_kwargs={"transformer_kwargs": {"max_payload": 1}})
Expected Behavior:
The batch transform job should handle multiple partitions correctly, aligning columns across the partitions and ignoring or managing headers if present in individual partitions.
Observed Behavior:
The job fails with the following error logs:
Bad HTTP status received from algorithm: 500
invalid literal for int() with base 10: '0.1': Error while type casting for column 'capital-loss'
Logs show that the columns are misaligned for certain partitions:
Running batch transform in SageMaker with MultiRecord strategy.
MaxPayloadInMB=1 is set to ensure multiple partitions.
Additional Information:
The issue seems to be that AutoGluon Cloud is not handling the headers properly when dealing with batch transform partitioned records. In a multi-partition job, not all batches will have the header/column, which is causing the column misalignment.
Note:
This fix currently only works for the tabular predictor. Support for multimodal and timeseries predictors depends on the implementation of original_features, which can be tracked in issue #4477.
Description:
This PR fixes the issue where batch transform jobs fail due to column misalignment when the input CSV file is partitioned into multiple records. The problem arises because headers from different partitions are not handled properly, leading to misaligned columns and prediction failures during inference.
Changes:
_read_with_fallback
and_align_columns
helper functions to handle column alignment.transform_fn
intabular_serve.py
to use these helper functions.Limitations:
original_features
, which can be tracked in issue #4477.Steps to Reproduce: The following script can be used to reproduce the issue:
Expected Behavior: The batch transform job should handle multiple partitions correctly, aligning columns across the partitions and ignoring or managing headers if present in individual partitions.
Observed Behavior: The job fails with the following error logs:
Logs show that the columns are misaligned for certain partitions:
Environment:
autogluon==1.1.0
MultiRecord
strategy.MaxPayloadInMB=1
is set to ensure multiple partitions.Additional Information: The issue seems to be that AutoGluon Cloud is not handling the headers properly when dealing with batch transform partitioned records. In a multi-partition job, not all batches will have the header/column, which is causing the column misalignment.
Note: This fix currently only works for the tabular predictor. Support for multimodal and timeseries predictors depends on the implementation of
original_features
, which can be tracked in issue #4477.