Datatamer / tamr-client

Programmatically interact with Tamr
https://tamr-client.readthedocs.io
Apache License 2.0
11 stars 25 forks source link

No simplejson #457

Closed skalish closed 4 years ago

skalish commented 4 years ago

↪️ Pull Request

Relates to #455

This PR explores two ways that simplejson could be removed from the dependency list, as described in #455. The two commits are not really consecutive, but are instead alternatives.

  1. The first commit removes simplejson via a breaking change, where the _update_records/upsert_records methods no longer have the ignore_nan parameter. This means when using these functions, it is on the user to make sure NaNs are not present in the records being uploaded, though the function does use json's allow_nan=False option to raise an error before making the destined-to-fail request to the server if NaN's are present. Handling of NaNs is present in the upsert_from_dataframe function, since this is the primary case where such behavior is desired. This is done in the same was as in tamr_client and both does not depend on simplejson and is robust.

  2. The second commit also removes simplejson, but leaves the ignore_nan parameter in _update_records/upsert_records. This parameter is removed from the upsert_from_dataframe and create_from_dataframe functions, since these are changed to use the tamr_client DataFrame solution. The custom implementation of ignore_nan for _update_records/upsert_records is simple, but limited. While serializing JSON of each update, allow_nan=False is used and raised errors are handled by looking for Python NaN values at the top-level within a record. This means that the breaking change is limited to the last of the following example updates:

    {
    "action": "CREATE", 
    "recordId": 1,
    "record": {"pk": 1, "attr": 42}  # Nothing to catch
    }
    {
    "action": "CREATE",
    "recordId": 2,
    "record": {"pk": 2, "attr": NaN}  # Caught and converted
    }
    {
    "action": "CREATE",
    "recordId": 2,
    "record": {"pk": 2, "attr": {"a": NaN, "b": 42}}  # Not converted, will raise ValueError
    }

✔️ PR Todo

skalish commented 4 years ago

This PR is superseded by #468