Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.05k stars 2.51k forks source link

Use AutoML in an Azure Machine Learning pipeline in Python #1494

Open alzj96 opened 3 years ago

alzj96 commented 3 years ago

I am trying to replace the dataset in the Sample > https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automlstep-in-pipelines. But it shows that

All columns were automatically detected to be dropped by AutoML as no useful information could be inferred from the input data. The detected column purposes are the following,
Column Column1 identified as Hashes.
Column Time identified as Ignore.
Column V1 identified as Hashes.
Column V2 identified as Hashes.
Column V3 identified as Hashes.
Column V4 identified as Hashes.
Column V5 identified as Hashes.
Column V6 identified as Hashes.
Column V7 identified as Hashes.
Column V8 identified as Hashes.
Column V9 identified as Hashes.
Column V10 identified as Hashes.
Column V11 identified as Hashes.
Column V12 identified as Hashes.
Column V13 identified as Hashes.
Column V14 identified as Hashes.
Column V15 identified as Hashes.
Column V16 identified as Hashes.
Column V17 identified as Hashes.
Column V18 identified as Hashes.
Column V19 identified as Hashes.
Column V20 identified as Hashes.
Column V21 identified as Hashes.
Column V22 identified as Hashes.
Column V23 identified as Hashes.
Column V24 identified as Hashes.
Column V25 identified as Hashes.
Column V26 identified as Hashes.
Column V27 identified as Hashes.
Column V28 identified as Hashes.
Column Amount identified as Ignore.
Please either inspect your input data or use featurization config to give hints about the desired data transformation.

I try to add featurization config in the AutoML, but it still not works. I look at the 70_driver_log.txt file for the featurization run in the "Outputs + Logs" section for the AutoMLStep node in the UI, I find something like the following:

2021-05-29 17:32:03.515 - INFO - Start updating column purposes using customized feature type settings.
2021-05-29 17:32:03.516 - WARNING - Could not update column number 2 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.634 - WARNING - Could not update column number 3 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.634 - WARNING - Could not update column number 4 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.635 - WARNING - Could not update column number 5 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.635 - WARNING - Could not update column number 6 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.635 - WARNING - Could not update column number 7 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.636 - WARNING - Could not update column number 8 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.636 - WARNING - Could not update column number 9 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.636 - WARNING - Could not update column number 10 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.637 - WARNING - Could not update column number 11 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.637 - WARNING - Could not update column number 12 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.637 - WARNING - Could not update column number 13 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 14 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 15 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 16 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.638 - WARNING - Could not update column number 17 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.639 - WARNING - Could not update column number 18 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.639 - WARNING - Could not update column number 19 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.639 - WARNING - Could not update column number 20 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 21 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 22 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 23 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.640 - WARNING - Could not update column number 24 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.641 - WARNING - Could not update column number 25 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.641 - WARNING - Could not update column number 26 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.641 - WARNING - Could not update column number 27 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.642 - WARNING - Could not update column number 28 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.642 - WARNING - Could not update column number 29 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Hashes. Please check your column before overriding feature type.
2021-05-29 17:32:03.643 - WARNING - Could not update column number 30 to Numeric since pandas.api.types.infer_dtype returned string. Setting back to Ignore. Please check your column before overriding feature type.
2021-05-29 17:32:03.643 - INFO - End updating column purposes using customized feature type settings.

Here is how I defined my training step.

from azureml.train.automl import AutoMLConfig
from azureml.pipeline.steps import AutoMLStep
from azureml.automl.core.featurization import FeaturizationConfig

featurization_config = FeaturizationConfig()
featurization_config.add_column_purpose('V1', 'Numeric')
featurization_config.add_column_purpose('V2', 'Numeric')
featurization_config.add_column_purpose('V3', 'Numeric')
featurization_config.add_column_purpose('V4', 'Numeric')
featurization_config.add_column_purpose('V5', 'Numeric')
featurization_config.add_column_purpose('V6', 'Numeric')
featurization_config.add_column_purpose('V7', 'Numeric')
featurization_config.add_column_purpose('V8', 'Numeric')
featurization_config.add_column_purpose('V9', 'Numeric')
featurization_config.add_column_purpose('V10', 'Numeric')
featurization_config.add_column_purpose('V11', 'Numeric')
featurization_config.add_column_purpose('V12', 'Numeric')
featurization_config.add_column_purpose('V13', 'Numeric')
featurization_config.add_column_purpose('V14', 'Numeric')
featurization_config.add_column_purpose('V15', 'Numeric')
featurization_config.add_column_purpose('V16', 'Numeric')
featurization_config.add_column_purpose('V17', 'Numeric')
featurization_config.add_column_purpose('V18', 'Numeric')
featurization_config.add_column_purpose('V19', 'Numeric')
featurization_config.add_column_purpose('V20', 'Numeric')
featurization_config.add_column_purpose('V21', 'Numeric')
featurization_config.add_column_purpose('V22', 'Numeric')
featurization_config.add_column_purpose('V23', 'Numeric')
featurization_config.add_column_purpose('V24', 'Numeric')
featurization_config.add_column_purpose('V25', 'Numeric')
featurization_config.add_column_purpose('V26', 'Numeric')
featurization_config.add_column_purpose('V27', 'Numeric')
featurization_config.add_column_purpose('V28', 'Numeric')
featurization_config.add_column_purpose('Amount', 'Numeric')
featurization_config.add_column_purpose('Class', 'CategoricalHash')

# Change iterations to a reasonable number (50) to get better accuracy
automl_settings = {
    "iteration_timeout_minutes" : 10,
    "iterations" : 2,
    "experiment_timeout_hours" : 0.25,
#     "featurization": featurization_config,
    "primary_metric" : 'AUC_weighted'
}

automl_config = AutoMLConfig(task = 'classification',
                             debug_log = 'automated_ml_errors.log',
                             compute_target = compute_target,
                             run_configuration = aml_run_config,
                             featurization = featurization_config,
                             training_data = prepped_data,
#                              label_column_name = 'Survived',
                             label_column_name = 'Class',
                             **automl_settings)

train_step = AutoMLStep(name='AutoML_Classification',
    automl_config=automl_config,
    passthru_automl_config=False,
    outputs=[metrics_data,model_data],
    enable_default_model_output=False,
    enable_default_metrics_output=False,
    allow_reuse=True)

The new dataset is > https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv

swatig007 commented 3 years ago

HI @alzj96 - can you pls provide the AutoML run ID for this run? Thanks

cartacioS commented 3 years ago

Hi @alzj96 - I just wanted to follow up and see if you're able to provide the AutoML run ID as requested above so that we can look further into this issue for you?

alzj96 commented 3 years ago
Hi Sabina, Thanks for your help. I’ve already solve that problem. But currently I have problem when I want to deploy  the model in Azure DevOps. Here is the logs I have for that problem. Can you help me with that? Thanks,Arron Zeng  Sent from Mail for Windows 10 From: Sabina CartacioSent: June 17, 2021 12:01 PMTo: Azure/MachineLearningNotebooksCc: alzj96; MentionSubject: Re: [Azure/MachineLearningNotebooks] Use AutoML in an Azure Machine Learning pipeline in Python (#1494) Hi @alzj96 - I just wanted to follow up and see if you're able to provide the AutoML run ID as requested above so that we can look further into this issue for you?—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe. 
rtanase commented 3 years ago

@alzj96 , I'm not able to see the logs for the deployment problem.. can you paste them?