aws / sagemaker-experiments

Experiment tracking and metric logging for Amazon SageMaker notebooks and model training.
Apache License 2.0
126 stars 36 forks source link

log_metric() raises warning, doesn't reflect in trial component #75

Closed athewsey closed 4 years ago

athewsey commented 4 years ago

Describe the bug

(Possibly related to #72 but I'm specifically working in SageMaker Studio)

Explicit Tracker.log_metric() calls raise a warning (see details below) and don't seem able to send metric data to SageMaker Experiments.

I've tried various combinations of:

...but haven't found any working combination where I can explicitly call log_metric() and see the stored data through the SMExperiments API/UI.

To Reproduce

From a SageMaker Studio Python 3 (Data Science) kernel notebook:

experiment = Experiment.create(
    experiment_name=f"metrictestexp-{int(time.time())}",
    description="Trying to get metrics working",
)

tracker_nowith = Tracker.create(display_name="nowith-comp")

tracker_nowith.log_input(
    name="Raw input",
    media_type="text/csv",
    value="https://..."
)
tracker_nowith.log_parameters({
    "categorical_columns": [],
    "normalization_std": {},
    "aparam": 1,
})
tracker_nowith.log_output("train-csv", f"s3://...", "text/csv")

tracker_nowith.log_metric("my-cool-metric-name", 2501.4057)

The last log_metric line seems to raise the following warning (not Exception):

ERROR:root:'NoneType' object has no attribute 'write'

Input and output artifacts and parameters seem to be recorded and visible no problem, and a local {PID}.json file is generated but metrics are not visible through the Experiments APIs/UI.

tracker_nowith.trial_component.metrics returns None

Expected behavior

log_metric() to record the submitted named, scalar metric which should become visible through TrialComponent.metrics and the SageMaker Studio Experiments UI.

Screenshots

None

Environment: Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): None Framework Version: N/A Python Version: 3.7.6 CPU or GPU: ml.t3.medium Python SDK Version: smexperiments 0.1.14, sagemaker 1.60.0 Are you using a custom image: No - SMStudio Data Science kernel

Additional context

Use case is to record summary metrics derived from an in-notebook data preprocessing step: E.g. number of features after processing, summary statistics, etc.

danabens commented 4 years ago

_Can't see logmetric() and see the stored data through the SMExperiments API/UI

log_metric only logs to file currently. Eventually metrics will be logged server side and visible in API/UI, but no ETA.

ERROR:root:'NoneType' object has no attribute 'write'

This error was an indicator of the lazy-instantiation of the file writer and fixed in https://github.com/aws/sagemaker-experiments/pull/76

shashankprasanna commented 4 years ago

@danabens What file does log_metric save the results to? The documentation only says "Record a scalar metric value for this TrialComponent to file, not SageMaker."

That said, I've seen that log_metric does send the results to sagemaker, and can be queries with SageMaker Analytics. However, this seem to be inconsistent.

athewsey commented 4 years ago

Agreed - as far as I can tell it only logs to a JSON file created in the same directory? I'm not sure I understand the use case for that functionality..?

+1 for getting SageMaker-side metric logging ASAP: There are custom output metrics (not parameters) of my Trial Components that I'd like to track as part of the SageMaker Experiment... E.g. per-field summary statistics after some pre-processing in a notebook or Processing Job.

danabens commented 4 years ago

the file metrics writer writes to a file named .json . optionally, the environment variable SAGEMAKER_METRICS_DIRECTORY will determine the directory where the file is wriitten. code is here.

I've seen that log_metric does send the results to sagemaker This are likely metrics generated by the job running in SageMaker rather than using log_metric via notebook.

danabens commented 4 years ago

We recently fixed a metrics file buffering issue that may be related: https://github.com/aws/sagemaker-experiments/commit/bd48bdcae167df198fc7dbf66ff2d2a3508f2226

danabens commented 4 years ago

Both issues (warning and not seeing metrics) are fixed and released. Let us know if you encounter any further issues.