Closed skalish closed 4 years ago
Regarding the suggestion to allow flexibility in passing other json_args
https://github.com/Datatamer/tamr-client/pull/209#discussion_r307306802: while this is a good point, I am unaware of any cases where this functionality is being taken advantage of, but I have seen at least two cases where getting tamr-unify-client
installed on a customer machine without access to the internet was quite painful.
If the main issue is that including NaN
will silently fail or fail downstream, we could use the allow_nan parameter of the built-in json.dumps
function (setting allow_nan=False
) to fail fast.
Then users can fix their records before passing them to TC functions with allow_nan=False
.
The original issue #194 was about including the option to ignore NaN
s, that is, having tamr_client
take care of the conversion from NaN
to null
if the user chooses. The allow_nan=False
solution does avoid the original problematic behavior of downstream 400 errors, but this should be recognized as an intentional decision to no longer provide NaN
->null
functionality (which is relied upon by some field code, though I am unsure if any of it is still in use).
💬 RFC
simplejson
is now the only dependency in this project whose build is machine-dependent (i.e. if youpip download
on MacOS, the result cannot be used on Ubuntu). Additionally, to my knowledge it is only used to a very limited extent: to allow a toggle toignore_nan
values in the record update operation oftamr_unify_client
. This means a record (as dictionary) containing at least oneNaN
that is upserted to Tamr with the client will cause anHTTPError
ifignore_nan == True
or will have this value coerced tonull
ifignore_nan == False
.This dependency is not present in
tamr_client
, however, there are no plans to removetamr_unify_client
even after the official promotion of the BETA to the primary implementation. For this reason, I think it is important thatsimplejson
be removed to ease the installation process.If it is decided that we remove this, the
ignore_nan
functionality needs to either be added in its place or be dropped as an unneeded feature. In the former case, it might also be prudent to add similar functionality totamr_client
.Related to discussion and decisions in #194 and #209.