DataBiosphere / newt-transformer

Amphibious new data transformer to prepare various sources for CGP DSS Data Loader
Apache License 2.0
1 stars 2 forks source link

Sanitize metadata using values of the same type #32

Closed mikebaumann closed 5 years ago

mikebaumann commented 5 years ago

When sanitizing data with data_bleach.py the sanitized value is now the same type as the original value.

Retaining the types of sanitized values now will help with a smoother transition (of indexing, etc.) to the full/unsanitized data.

The current change has been manually tested with large NIH data sets. @jessebrennan is developing a unit test for data_bleach.py and that will test preservation of the value types.

Resolves #31

mikebaumann commented 5 years ago

Good question, @jessebrennan. The current change covers all the defined JSON value types: https://tools.ietf.org/html/rfc8259#page-6