Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.36k stars 2.71k forks source link

TCN Forecaster: The provided path to the data in the Datastore does not exist. #35810

Open ErvinPuratos opened 1 month ago

ErvinPuratos commented 1 month ago

Describe the bug Whenever I try to train a model use a dnn model, namely the TCNForecater, the job will fail with the following error message:

The provided path to the data in the Datastore does not exist. Error: Error Code: ScriptExecution.StreamAccess.NotFound Native Error: Dataflow visit error: ExecutionError(StreamError(NotFound)) VisitError(ExecutionError(StreamError(NotFound))) => Failed with execution error: error in streaming from input data sources ExecutionError(StreamError(NotFound)) Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=xxx Marking the experiment as failed because initial child jobs have failed due to user error

The dataset is indeed valid and I can load it into a notebook. To Reproduce Steps to reproduce the behavior:

  1. Create an AutoMLConfig object with these parameters:
    • enable_stack_ensemble=True
    • allowed_models=["TCNForecaster"]
  2. submit auto ml job

Expected behavior The automl job should train a valid model

Screenshots image image

Additional context The logs do not seem to provide any additional information. If I disable those 2 parameters and do not change anything else, including the dataset used for training, then the automl job will proceed as expected.

Please let me know if you would need additional info to troubleshoot.

Kind regards,

Ervin

github-actions[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

isaudagar commented 3 weeks ago

Hi @ErvinPuratos, Thanks to open this issue. As connected with AutoML team, Version is pretty old and recommending to update later version. Could you please keep posted us if it is working or not. image

ErvinPuratos commented 3 weeks ago

Hi,

thank you for your answer.

The problem is if I use any other version of the SDKv1 I will get this error:

Message: rslex failed
Payload: {"pid": 55039, "rslex_version": "2.22.2", "version": "5.1.6"}
2024-06-06 10:52:37.698241 | ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Failure, Duration=60.79 [ms], Info = {'activity_id': '1ff5decd-e91d-4982-88d5-665065770255', 'activity_name': 'to_pandas_dataframe', 'activity_type': 'PublicApi', 'app_name': 'TabularDataset', 'source': 'azureml.dataset', 'version': '1.56.0', 'dataprepVersion': '5.1.6', 'sparkVersion': '', 'subscription': '', 'run_id': '', 'resource_group': '', 'workspace_name': '', 'experiment_id': '', 'location': '', 'completionStatus': 'Success', 'durationMs': 136.29}, Exception=UserErrorException; UserErrorException:
    Message: Execution failed in operation 'to_pandas_dataframe' for Dataset(id='0c6669da-671b-49be-a6cd-bcb91893ad6d', name='fcst_pred_Hybr_1101_Pat-', version=8, error_code=ScriptExecution.StreamAccess.NotFound,error_message=The requested stream was not found. Please make sure the request uri is correct.| session_id=l_f4e71928-4d04-4b5f-a990-88217ddaad3b) ErrorCode: ScriptExecution.StreamAccess.NotFound
    InnerException 
Error Code: ScriptExecution.StreamAccess.NotFound
Native Error: Dataflow visit error: ExecutionError(StreamError(NotFound))
    VisitError(ExecutionError(StreamError(NotFound)))
=> Failed with execution error: error in streaming from input data sources
    ExecutionError(StreamError(NotFound))
Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=l_f4e71928-4d04-4b5f-a990-88217ddaad3b
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Execution failed in operation 'to_pandas_dataframe' for Dataset(id='xxx', name='yyy', version=8, error_code=ScriptExecution.StreamAccess.NotFound,error_message=The requested stream was not found. Please make sure the request uri is correct.| session_id=l_f4e71928-4d04-4b5f-a990-88217ddaad3b) ErrorCode: ScriptExecution.StreamAccess.NotFound"
    }
}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}

version used:

azureml-core 1.56.0 azureml-dataprep 5.1.6

I can load the Dataset: Dataset.get_by_name(ws, name="yyy")

but the error above happens when I try to use the to_pandas_dataframe() method on the Dataset

skasturi commented 2 weeks ago

@SamGos93 Could you please take a look?

SamGos93 commented 6 days ago

The AutoMLConfig object used these parameters: enable_stack_ensemble=True allowed_models=["TCNForecaster"]

But TCN cannot be included in ensembles. So, the enable_stack_ensemble should be False. AutoML supports Voting (and Stack) ensemble for non DNN models.

We have added this in our documentation as well. Please look at "Note" section.

image