Azure / azureml-examples

Official community-driven Azure Machine Learning examples, tested with GitHub Actions.
https://docs.microsoft.com/azure/machine-learning
MIT License
1.76k stars 1.44k forks source link

AzureMLException: Message: Failed to flush task queue within 300.0 seconds | dreambooth finetune text to image #3313

Open jyravi opened 3 months ago

jyravi commented 3 months ago

Operating System

Linux

Version Information

Running the sample in Compute Instance within the Azure ML workspace. Running the sample provided in azureml-examples pipeline component : diffusers_text_to_image_dreambooth_pipeline

Steps to reproduce

  1. Ran the notebook as -is provided in azureml-examples "https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/text-to-image/diffusers-dreambooth-dog-text-to-image.ipynb"

Expected behavior

Finetune the model successfully.

Actual behavior

encountered error in the child job "text_to_image_dreambooth_finetune".

Encountered an internal ACFT error. Error Message/Code: AzureMLException: Message: Failed to flush task queue within 300.0 seconds. Please set AZUREML_ARTIFACTS_DEFAULT_TIMEOUT environment variable to increase the timeout(in seconds) InnerException None ErrorResponse { "error": { "code": "UserError", "message": "Failed to flush task queue within 300.0 seconds. Please set AZUREML_ARTIFACTS_DEFAULT_TIMEOUT environment variable to increase the timeout(in seconds) ", "inner_error": { "code": "ResourceExhausted", "inner_error": { "code": "Timeout" } } } }. Traceback: File "swallow_all_exceptions_decorator.py", line 68, in wrapper return func(*args, **kwargs) File "[Non-AutoML file]", line 1097, in [Non-AutoML function] File "finetune_runner.py", line 252, in finetune_runner mlflow.log_artifacts(component_args.output_dir, SettingParameters.DEFAULT_OUTPUT_DIR) File "[Non-AutoML file]", line 1096, in [Non-AutoML function] File "[Non-AutoML file]", line 1242, in [Non-AutoML function] File "[Non-AutoML file]", line 570, in [Non-AutoML function] File "[Non-AutoML file]", line 88, in [Non-AutoML function] File "[Non-AutoML file]", line 97, in [Non-AutoML function] File "[Non-AutoML file]", line 29, in [Non-AutoML function] File "[Non-AutoML file]", line 55, in [Non-AutoML function] File "[Non-AutoML file]", line 135, in [Non-AutoML function] . Additional information: [Hidden as it may contain PII]. 4.

Addition information

Talks about increasing the timeout. Not sure in compute instance how to find the file and change the default time out.