aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.09k stars 6.76k forks source link

multiclass with csv: Given label size mismatched with prediction size. #565

Closed Schmidtbit closed 5 years ago

Schmidtbit commented 5 years ago

My data (in CSV format) has 8 classes, labeled 0-7, located in the first column, no headers, no index. When I run the training job in multi:softmax I keep getting the error after 2 minutes:

ClientError: Given label size mismatched with prediction size. Please ensure the first column is label and the correct metric is applied.

Please help me dubug. I am new to SageMaker and don't know what I am doing wrong. Thanks!

This is how I created the target column:

train_df['Response'] = train_df['Response'].astype(int)
target = train_df.pop('Response').values
le = preprocessing.LabelEncoder()
y = le.fit_transform(target)
train_df.insert(0,'Response', y)

This is how I saved the dataframe and uploaded to s3:

multi_train_data.to_csv("multi_formatted_train.csv", sep=',', header=False, index=False) # save training data 
multi_val_data.to_csv("multi_formatted_val.csv", sep=',', header=False, index=False) # save validation data

train_file = 'multi_formatted_train.csv'
val_file = 'multi_formatted_val.csv'

boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/', train_file)).upload_file(train_file)
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'val/', val_file)).upload_file(val_file)

This is how I set up training:

from sagemaker.amazon.amazon_estimator import get_image_uri
container = get_image_uri(boto3.Session().region_name, 'xgboost')

job_name = 'prudential-xgboost-multi-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Training job", job_name)

create_training_params = \
{
    "AlgorithmSpecification": {
        "TrainingImage": container,
        "TrainingInputMode": "File"
    },
    "RoleArn": role,
    "OutputDataConfig": {
        "S3OutputPath": "s3://{}/{}/multi/".format(bucket, prefix),
    },
    "ResourceConfig": {
        "InstanceCount": 1,
        "InstanceType": "ml.m4.4xlarge",
        "VolumeSizeInGB": 20
    },
    "TrainingJobName": job_name,
    "HyperParameters": {
        "num_class":"8",
        "eta":"0.003",
        "gamma":"1.2",
        "max_depth":"6",
        "min_child_weight":"2",
        "max_delta_step":"0",
        "subsample":"0.6",
        "colsample_bytree":"0.35",
        "scale_pos_weight":"1.5",
        "silent":"1",
        "seed":"1301",
        "lambda":"1",
        "alpha":"0.2",
        "objective": "multi:softmax",
        "eval_metric": "auc",
        "num_round": "4269"
    },
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 60 * 60
    },
    "InputDataConfig": [
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri":  "s3://{}/{}/train/".format(bucket, prefix),
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "csv",
            "CompressionType": "None"
        },
        {
            "ChannelName": "validation",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": "s3://{}/{}/val/".format(bucket, prefix),
                    "S3DataDistributionType": "FullyReplicated"
                }
            },
            "ContentType": "csv",
            "CompressionType": "None"
        }
    ]
}
tullydwyer commented 5 years ago

I am having the exact same issue however only when training via SageMaker Hyperparameter tuning jobs. Please see details below. @Schmidtbit the code below works for regular SageMaker training jobs, possibly check to see if there are any differences.

I tried positing this in AWS forums however I am not verified:


Hi there,

I have spent hours trying all kinds of combinations but I am really stumped by this.. when I train via a standard Training job, the model trains and completes every time. When I train via Hyperparameter tuning job with the exact same data and the same settings* it fails every single time with the below error.

*have to specify 2 ranges to get the Hyperparameter job to create

ClientError: Given label size mismatched with prediction size. Please ensure the first column is label and the correct metric is applied.

I don't know what else there is that I can modify to get the hyperparameter training to work.

Please see below for images and code: Hyperparameters for successful training via normal training job: https://ibb.co/FBZxZC3 Hyperparameters for failed training via Hyperparameter tuning job: https://ibb.co/rkKdFrT Result: https://ibb.co/wJ1DBVW Detailed CW error logs: https://gist.github.com/tullydwyer/c61c70973f603d24828f9204c24b04bb Code to start normal training job: https://gist.github.com/tullydwyer/fdf14a124dfdbac30c6ce16f8e39e96a Code to start hyperparameter tuning job: https://gist.github.com/tullydwyer/0f19059bbb4d0097fcd5769fa6c755c9

Thanks for any advice here. I am really stumped and want to use Hyperparameter tuning but I cannot get it to work with the same settings and data that work for a normal training job.

Thanks, Tully

tullydwyer commented 5 years ago

@Schmidtbit I think your "eval_metric": "auc" does not work with "objective": "multi:softmax". I left it out for normal training jobs because it is optional and is set based on the objective value. I think it defaults to error for classification. See: https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst

edit: You need merror instead of auc for "multi:softmax"

tullydwyer commented 5 years ago

Okay I figured out the issue I was having.

eval_metric is optional and defaults based on objective value. The default value works for normal training jobs however the default does not work for Hyperparameter tuning jobs for some reason. You need to specifically define it with the eval_metric="merror" and the following HyperparameterTuner input

tuner = HyperparameterTuner(
    objective_metric_name='validation:merror'
)
larsritland commented 5 years ago

@tullydwyer Thanks a lot! What value should the 'Type' parameter of the HyperparameterTuner have?

tullydwyer commented 5 years ago

@larsritland I don't see a 'Type' parameter exactly, but there is 'objective_type' and 'early_stopping_type'. 'objective_type' should be 'Minimise' as per https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-tuning.html. 'early_stopping_type' I would use 'Auto' (never tested this). Auto should lower the Hyperparameter tuning time. See https://sagemaker.readthedocs.io/en/latest/tuner.html

larsritland commented 5 years ago

@tullydwyer Thanks a lot again. My create_hyper_parameter_tuning_job() call failed complaining that Type was not set in HyperparameterTuner in HyperParameterTuningJobConfig. I set it to Minimize. I may test the early_stopping_type also.

Schmidtbit commented 5 years ago

@tullydwyer that was exactly my issue! I resolved it by changing the eval_metric. Thanks!