Closed Lewington-pitsos closed 2 years ago
I am experiencing exactly the same issue with Tracker.log_metrics
from inside a training job
For reference, I got this to work by setting enable_sagemaker_metrics=True
inside the Estimator
init. The documentation around this is really quite poor, it would be helpful for users to be able to work this out without reading the source code and/or guessing
unable to retrieve these metrics later
Looks like swattstgt identified the root cause of enable_sagemaker_metrics
on the Estimator not being set.
Presently I am unable to find any similar code outlining the intended log_metrics workflow in either this repo or in amazon-sagemaker-examples.
Ya, will add an example notebook.
The documentation around this is really quite poor
Ya the behavior of enable_sagemaker_metrics
is complex and there is no reference to the relationship between this parameter and log_metric
in the Tracker
. Will update docs.
For reference: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html#sagemaker-Type-AlgorithmSpecification-EnableSageMakerMetricsTimeSeries
In addition to training jobs, it would be very useful if metrics could also be logged from sagemaker processing jobs. I note that the Sagemaker API has been opened to log from anywhere except log_metrics(()
according to https://github.com/aws/sagemaker-experiments/issues/142.
I too have this same issue. I have set the enable_sagemaker_metrics=True
but still no luck. My "pipeline" script has the following. I'm running this currently in local
mode (i.e., instance_type='local'
), which I worry is triggering this warning WARNING:root:Cannot write metrics in this environment.
but doesn't really make sense since it's running in Sagemaker's SKLearn container:
sk_model = SKLearn(
source_dir="src/",
entry_point="training/model.py",
role=sagemaker.get_execution_role(),
framework_version="0.23-1",
instance_count=1,
instance_type=instance_type,
output_path=model_s3_uri,
code_location=code_s3_uri,
base_job_name=model_id,
enable_sagemaker_metrics=True,
environment={"MODEL_ID": model_id},
tags=tags,
)
and my entry_point
code looks like the following:
with Tracker.create(display_name="evaluation", sagemaker_boto_client=sm) as tracker:
tracker.log_metric(metric_name="best_cv_score", value=cv_best_score, timestamp=t,)
tracker.log_metric(metric_name="score", value=scor, timestamp=t)
tracker.log_confusion_matrix(y_test, predictions, title="conf-mtrx")
tracker.log_metric(metric_name="roc", value=roc, timestamp=t)
tracker.log_roc_curve(y_test, predictions, title="roc-curve")
Trial.load(trial_name=model_id).add_trial_component(tracker.trial_component)
I can see the evaluation
trial component in the sagemaker UI but there is nothing logged inside of it. Any form of guidance would be useful.
In addition to training jobs, it would be very useful if metrics could also be logged from sagemaker processing jobs. I note that the Sagemaker API has been opened to log from anywhere except log_metrics(() according to https://github.com/aws/sagemaker-experiments/issues/142.
@lorenzwalthert - Can you provide some additional detail on your use case for metrics in processing jobs? Create a new issue in this repo. Thanks.
but doesn't really make sense since it's running in Sagemaker's SKLearn container:
@jlloyd-widen the Tracker.log_metric
requires an agent running on the training host which ingests metrics into SageMaker from the file which log_metric
writes to. log_metric
doesn't work in local mode because the metric agent isn't present in the local container. Inability to log metrics to SageMaker from local/non-sagemaker environments is a known limitation we are investigating.
I have nearly exactly the same issue as @athewsey had originally in Issue #73.
I have been trying for several hours to save experiments, trials and trial components in various orders such that
log_metrics
actually logs any metrics. I am callinglog_metrics
from a tracker created usingload
rather thancreate
, inside a training job and no warnings are printed, but no matter what I do aws sagemaker studio and sagemaker experiments api seem unable to retrieve these metrics later (though parameters and artifacts are certainly logged).@danabens can you provide a code snippet or the full code you ran before august 8 2020 that indicated to you that metrics are working as intended? This would possibly allow me to determine the source of my issue.
Presently I am unable to find any similar code outlining the intended
log_metrics
workflow in either this repo or inamazon-sagemaker-examples
.