Closed jmahlik closed 1 year ago
Any thoughts on the best approach here? I was going to start on a PR soon for log_parameter
and log_metric
. IMO the best approach might be to not log values to the tracker and have a standard library logger log a warning like f"{metric_name} was {metric_value}, metric not logged to experiment".
Then when looking at experiment analytics, the metric just shows as na for the trial component.
There might be other resolutions that I'm not aware of though.
I am unable to log_parameters in Tracker with boolean types, is this a known issue?
sagemaker version: 0.1.30
with Tracker.create(display_name="example", sagemaker_boto_client=sm_boto3) as tracker:
tracker.log_parameters(
{example_boolean_param: True}
)
Error: ClientError: An error occurred (400) when calling the UpdateTrialComponent operation:
https://github.com/aws/sagemaker-experiments/blob/main/src/smexperiments/tracker.py#L229
the parameter value the log_parameters
accepts are string and number, so it should work if you pass in "true"
Any thoughts on the best approach here? I was going to start on a PR soon for
log_parameter
andlog_metric
. IMO the best approach might be to not log values to the tracker and have a standard library logger log a warning likef"{metric_name} was {metric_value}, metric not logged to experiment".
Then when looking at experiment analytics, the metric just shows as na for the trial component.There might be other resolutions that I'm not aware of though.
Hi jmahlik, thanks for the suggestion, and your idea sounds reasonable, I wonder if you already submitted the PR yet?
https://github.com/aws/sagemaker-experiments/blob/main/src/smexperiments/tracker.py#L229
the parameter value the
log_parameters
accepts are string and number, so it should work if you pass in"true"
Okay, will try but ideally a python bool should also work as it is a valid Number type, but it does not.
Any thoughts on the best approach here? I was going to start on a PR soon for
log_parameter
andlog_metric
. IMO the best approach might be to not log values to the tracker and have a standard library logger log a warning likef"{metric_name} was {metric_value}, metric not logged to experiment".
Then when looking at experiment analytics, the metric just shows as na for the trial component. There might be other resolutions that I'm not aware of though.Hi jmahlik, thanks for the suggestion, and your idea sounds reasonable, I wonder if you already submitted the PR yet?
Kind of hit a wall because it is hard to test what values are valid and what will fail without waiting for it to time out. I haven't had time to work on it further.
Describe the bug When the tracker logs something that is not JSON compliant, it causes training jobs to hang and eventually die. This is likely due to a boto request not being able to be sent. The biggest offenders are things like
np.nan
,None
,np.inf
,-np.inf
. They can be quite common in model building if a hyperparameter causes an odd metric value. I think this might become especially problematic usinglog_table
.Current workaround is to write validation code around the metric logging. It's become cumbersome to copy it or have jobs fail for forgetting to copy it.
I propose adding some additional parameter validation to the tracker itself. Happy to submit a PR for this but wanted to get any thoughts before undertaking it.
Side note to anyone who might read this,
isinstance(np.nan, numbers.Number) = True
. So, it will likely have to be some JSON compatibility check vs.isinstance
checking.To Reproduce Steps to reproduce the behavior: Log
np.nan
orNone
as the value inlog_metric
orlog_parameter
.Expected behavior Either raise an error or throw out the metric. Throwing away the metric with a warning or logging some kind of default or blank value would be ideal.
Environment: Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): BYOC Framework Version: N/A Python Version: python 3.8/3.9 CPU or GPU: CPU Python SDK Version: Are you using a custom image: Yes