FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

isFormattedAsDate is too lenient #136

Closed oribaldi closed 7 years ago

oribaldi commented 7 years ago

isFormattedAsDate returns true even if the given string is wrong.

This can be reproduced with a simple example, using the SimpleDateFormat class:

new SimpleDateFormat("dd-MM-yyy").parse("2002-06-06")

Resulting in the wrong Date: Mon Nov 23 00:00:00 CET 11.

The problem is with the SimpleDateFormat.parse(), because the "parsing does not necessarily use all characters up to the end of the string". One solution is to use "dateFormat.setLenient(false)". However, when I run your tests with this change one of the tests fail; the one that says: it should "fail if at least one element cannot be converted to Date".

FRosner commented 7 years ago

We should set it to non-lenient then. Thanks for figuring this out, @oribaldi