TCN Forecaster: The provided path to the data in the Datastore does not exist.

ErvinPuratos commented 5 months ago

Package Name: azureml-automl, python SDK v1
Package Version: 1.49.0
Operating System: Linux 24.01.30
Python Version: 3.8.10

Describe the bug Whenever I try to train a model use a dnn model, namely the TCNForecater, the job will fail with the following error message:

The provided path to the data in the Datastore does not exist. Error: Error Code: ScriptExecution.StreamAccess.NotFound Native Error: Dataflow visit error: ExecutionError(StreamError(NotFound)) VisitError(ExecutionError(StreamError(NotFound))) => Failed with execution error: error in streaming from input data sources ExecutionError(StreamError(NotFound)) Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=xxx Marking the experiment as failed because initial child jobs have failed due to user error

The dataset is indeed valid and I can load it into a notebook. To Reproduce Steps to reproduce the behavior:

Create an AutoMLConfig object with these parameters:
- enable_stack_ensemble=True
- allowed_models=["TCNForecaster"]
submit auto ml job

Expected behavior The automl job should train a valid model

Screenshots

Additional context The logs do not seem to provide any additional information. If I disable those 2 parameters and do not change anything else, including the dataset used for training, then the automl job will proceed as expected.

Please let me know if you would need additional info to troubleshoot.

Kind regards,

Ervin

github-actions[bot] commented 5 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @Azure/azure-ml-sdk @azureml-github.

isaudagar commented 5 months ago

Hi @ErvinPuratos, Thanks to open this issue. As connected with AutoML team, Version is pretty old and recommending to update later version. Could you please keep posted us if it is working or not.

ErvinPuratos commented 5 months ago

Hi,

thank you for your answer.

The problem is if I use any other version of the SDKv1 I will get this error:

Message: rslex failed
Payload: {"pid": 55039, "rslex_version": "2.22.2", "version": "5.1.6"}
2024-06-06 10:52:37.698241 | ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Failure, Duration=60.79 [ms], Info = {'activity_id': '1ff5decd-e91d-4982-88d5-665065770255', 'activity_name': 'to_pandas_dataframe', 'activity_type': 'PublicApi', 'app_name': 'TabularDataset', 'source': 'azureml.dataset', 'version': '1.56.0', 'dataprepVersion': '5.1.6', 'sparkVersion': '', 'subscription': '', 'run_id': '', 'resource_group': '', 'workspace_name': '', 'experiment_id': '', 'location': '', 'completionStatus': 'Success', 'durationMs': 136.29}, Exception=UserErrorException; UserErrorException:
    Message: Execution failed in operation 'to_pandas_dataframe' for Dataset(id='0c6669da-671b-49be-a6cd-bcb91893ad6d', name='fcst_pred_Hybr_1101_Pat-', version=8, error_code=ScriptExecution.StreamAccess.NotFound,error_message=The requested stream was not found. Please make sure the request uri is correct.| session_id=l_f4e71928-4d04-4b5f-a990-88217ddaad3b) ErrorCode: ScriptExecution.StreamAccess.NotFound
    InnerException 
Error Code: ScriptExecution.StreamAccess.NotFound
Native Error: Dataflow visit error: ExecutionError(StreamError(NotFound))
    VisitError(ExecutionError(StreamError(NotFound)))
=> Failed with execution error: error in streaming from input data sources
    ExecutionError(StreamError(NotFound))
Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id=l_f4e71928-4d04-4b5f-a990-88217ddaad3b
    ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Execution failed in operation 'to_pandas_dataframe' for Dataset(id='xxx', name='yyy', version=8, error_code=ScriptExecution.StreamAccess.NotFound,error_message=The requested stream was not found. Please make sure the request uri is correct.| session_id=l_f4e71928-4d04-4b5f-a990-88217ddaad3b) ErrorCode: ScriptExecution.StreamAccess.NotFound"
    }
}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe'}
{'infer_column_types': 'False', 'activity': 'to_pandas_dataframe', 'activityApp': 'TabularDataset'}

version used:

azureml-core 1.56.0 azureml-dataprep 5.1.6

I can load the Dataset: Dataset.get_by_name(ws, name="yyy")

but the error above happens when I try to use the to_pandas_dataframe() method on the Dataset

skasturi commented 4 months ago

@SamGos93 Could you please take a look?

SamGos93 commented 4 months ago

The AutoMLConfig object used these parameters: enable_stack_ensemble=True allowed_models=["TCNForecaster"]

But TCN cannot be included in ensembles. So, the enable_stack_ensemble should be False. AutoML supports Voting (and Stack) ensemble for non DNN models.

We have added this in our documentation as well. Please look at "Note" section.

github-actions[bot] commented 3 months ago

Hi @ErvinPuratos. Thank you for opening this issue and giving us the opportunity to assist. We believe that this has been addressed. If you feel that further discussion is needed, please add a comment with the text "/unresolve" to remove the "issue-addressed" label and continue the conversation.

github-actions[bot] commented 2 months ago

Hi @ErvinPuratos, since you haven’t asked that we /unresolve the issue, we’ll close this out. If you believe further discussion is needed, please add a comment /unresolve to reopen the issue.

Azure / azure-sdk-for-python

TCN Forecaster: The provided path to the data in the Datastore does not exist. #35810