aws / sagemaker-experiments

Experiment tracking and metric logging for Amazon SageMaker notebooks and model training.
Apache License 2.0
126 stars 36 forks source link

Tracker.load() does not work when using sagemaker pipeline #174

Open heizari opened 1 year ago

heizari commented 1 year ago

Describe the bug Create Experiment and Trial smexperiment. And configure PipelineExperimentConfig. Run training Job using sagemaker pipeline. (When without sagemaker pipeline, this bug did not occur) In training script, Tracker.load() return exception about this

Traceback (most recent call last):
  File "train.py", line 133, in <module>
    main()
  File "train.py", line 66, in main
    tracker = Tracker.load()
  File "/miniconda3/lib/python3.8/site-packages/smexperiments/tracker.py", line 161, in load
    _ArtifactUploader(tc.trial_component_name, artifact_bucket, artifact_prefix, boto3_session),
AttributeError: 'NoneType' object has no attribute 'trial_component_name'

Maybe this function return None. But TrialComponentEnvironment.source_arn is defined. So, I guess is this line wrong? Because environment["TRAINING_JOB_ARN"] contains uppercase when using sagemaker pipeline.

Sorry for my poor English.

To Reproduce

  1. configure Experiment, Trial and PipelineExperimentConfig
  2. run training job using pipeline
  3. Tracker.load() in training script

Expected behavior Tracker.load() load trial_component in pipeline

Environment: Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Framework Version: Python Version: 3.9.11 CPU or GPU:CPU Python SDK Version:

Are you using a custom image: yes

joshcx commented 1 year ago

I'm facing the same issue with Sagemaker Pipelines. When I remove the ".lower" in the code you mentioned here then it works. Is it necessary to convert the source arn to lowercase?