aws / amazon-sagemaker-feedback

Amazon SageMaker Public Feedback Dashboard
Creative Commons Attribution Share Alike 4.0 International
4 stars 1 forks source link

Long wait on "INFO: Invoking remote function inside conda environment: sagemaker-runtime-env" when executing training job #83

Open swietjak opened 1 month ago

swietjak commented 1 month ago

Product Version

Issue Description

The problem occurs when I create a training job using Sagemaker Python SDK.

with RemoteExecutor(instance_type="ml.g4dn.2xlarge", dependencies='./timeseries_env.yml', max_parallel_jobs=1, keep_alive_period_in_seconds=30) as executor:
    future = executor.submit(training_job, arg1, arg2)

After dependencies are installed from a yaml file (content provided below) the job freezes for more than an hour on the following line: INFO: Invoking remote function inside conda environment: sagemaker-runtime-env.

name: my_env
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pandas
  - scikit-learn
  - matplotlib
  - pip:
      - sagemaker
      - s3fs
      - fsspec
      - neuralforecast
      - hyperopt

Expected Behavior

The job doesn't freeze for so long on the mentioned line

Observed Behavior

No response

Product Category

Jobs

Feedback Category

Customer Support, Reliability and Stability, Startup Time and Latency

Other Details

No response