awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.18k stars 517 forks source link

Data type profiler with Scientific notation #302

Open calep opened 3 years ago

calep commented 3 years ago

Hi

At the moment it appears if a number is stored in scientific format, it's interpreted as a date.

e.g. 4.8054186314136e-02 comes through as string 0.048054186314136 comes through as fractional

Is it possible to update the library so it classifies scientific notation as fractional? Spark seems to deal with it fine when passing that figure into a dataframe.

Cheers

sscdotopen commented 3 years ago

Thanks for bringing this up, this is a known bug. Would you like to submit a PR for that?