Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

Remove dependency on simplejson and use a custom encoder when needed #468

Closed skalish closed 4 years ago

skalish commented 4 years ago

↪️ Pull Request

This PR breaks the dependency on simplejson in two ways:

tamr_client

The onus is placed on the user to ensure that their record updates do not contain NaN values, and that these are instead cast to the Python None. NaN values in a pandas DataFrame will not cause issues in the dataframe.upsert function since the generated dictionaries are sanitized by the Series method to_json.

tamr_unify_client

The usage of simplejson in the method Dataset._update_records has been replaced by using a custom JSON encoder when passing ignore_nan=True. This is now the only json_arg that is supported.

The custom encoder, defined in _custom_encoder.py is minimally changed from the standard library encoder to cast NaN (and Infinity/-Infinity) to 'null' in the style of simplejson. Compare to json and simplejson.

The method Dataset.upsert_from_dataframe has had the argument ignore_nan removed, since this is the only reasonable behavior. The implementation there has been changed to mirror the solution in tamr-client. (Maybe the parameter should be left to avoid a breaking change and be raised as a deprecation warning).

Closes #455

✔️ PR Todo

skalish commented 4 years ago

In terms of changelog, I think there is a breaking change in that the only json-encoding parameter that is accepted now is ignore_nan. Besides that, what is the best way to communicate these changes that are only user-facing in that simplejson is no longer a dependency, and deprecation warning are now added?

skalish commented 4 years ago

Gave it a shot, but the changelog could use a quick once-over to be sure the messaging is right and complete.