OHDSI / WhiteRabbit

WhiteRabbit is a small application that can be used to analyse the structure and contents of a database as preparation for designing an ETL. It comes with RabbitInAHat, an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.
http://ohdsi.github.io/WhiteRabbit
Apache License 2.0
174 stars 85 forks source link

Improve date type detection in csv files #313

Open MaximMoinat opened 2 years ago

MaximMoinat commented 2 years ago

Detection of the date type is limited when scanning csv files. The only two formats that will be detected are yyyy.mm.dd or mm.dd.yy (the separator can be anything and the two formats can be used interchangeable in the same document). Other orders of year, month, day will not be detected as dates.

We could use DateTimeFormatter patterns as an alternative to test multiple formats.

    public boolean isValid(String dateStr) {
        foreach dateFormat:
        DateFormat sdf = new SimpleDateFormat(dateFormat);
        sdf.setLenient(false);
        try {
            sdf.parse(dateStr);
            return true;
        } catch (ParseException e) {
            return false;
        }
        return false;
    }

Link to current StringUtilities.isDate method for reference: https://github.com/OHDSI/WhiteRabbit/blob/0247306c4ed836c71d71fb1615ca6ddf90ae200d/rabbit-core/src/main/java/org/ohdsi/utilities/StringUtilities.java#L839-L873