Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

Better handling for HTTP 204 on already up-to-date operations #293

Closed nbateshaus closed 5 years ago

nbateshaus commented 5 years ago

šŸ™‹ feature request

When doing something that starts an operation (e.g. a dataset .refresh()), if everything that would be produced by that operation is already up-to-date, the server will return an HTTP 204 with no content. The various bits of client code that do these things try to read the empty content as JSON to create an Operation, but raise a json.JSONDecodeError instead (because an empty document is not valid for the JSON data model).

šŸ¤” Expected Behavior

published_clusters = project.published_clusters()
op = published_clusters.refresh()
assert op.successful

šŸ˜Æ Current Behavior

published_clusters = project.published_clusters()
try:
    published_clusters.refresh()
except json.JSONDecodeError:
    # In the event that the dataset we want to refresh is already up-to-date,
    # the API will return an HTTP 200 with an empty body, causing a JSONDecodeError.
    # We can safely ignore this error.
    pass

šŸ’ Possible Solution

Whenever the server gives back an HTTP 204, create a dummy or No-Op operation to represent that.

šŸ”¦ Context

Putting the exception handling in is counter-intuitive and makes scripts verbose and hard to read.

It's hard to know exactly which things might return HTTP 204, so often the first I learn that I missed a spot is when I'm trying to run something and it fails unexpectedly.

Only after running into this a few times does one learn that JSONDecodeError means the operation was successful and nothing needed to be done on the server.

šŸ’» Examples

See above.

keziah-tamr commented 4 years ago

I am still seeing this issue when refresh() is run directly from a dataset. This issue does appear fixed when refresh is run through a project like: output_mastering_project.unified_dataset().refresh()

Example code: output_golden_record_project_dataset = unify_client.datasets.by_external_id("my_dataset_name") output_golden_record_project_dataset.refresh()

I've also noticed that when fresh is run through a project the job ALWAYS runs even when it is already up to data

@nbateshaus

nbateshaus commented 4 years ago

@keziah-tamr this was fixed in 0.10.0; i believe you were using 0.9.0