deeplearning4j / deeplearning4j-examples

Deeplearning4j Examples (DL4J, DL4J Spark, DataVec)
http://deeplearning4j.konduit.ai
Other
2.46k stars 1.82k forks source link

running org.datavec.transform.logdata.LogDataExample : Invalid format: "13/Jul/1995:08:51:11 -0400" is malformed #539

Open tztsweet opened 7 years ago

tztsweet commented 7 years ago

Issue Description

I am Running Demo in org.datavec.transform.logdata, get following ERROR:

java.lang.IllegalArgumentException: Invalid format: "13/Jul/1995:08:51:11 -0400" is malformed at "Jul/1995:08:51:11 -0400" at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187)

Version Information

0.9.1

AlexDBlack commented 7 years ago

I just ran the example using latest dl4j-examples master (including a fresh data download). It ran without issue.

Are you sure this hasn't been modified in any way (either the example or the dependencies)? It seems unlikely that something as simple as date parsing would fail on one system but succeed on another...

I would also recommend deleting the downloaded data (/datavec_log_example/ in your temp directory) in case there was a data issue - and running the example again.

ZengII commented 7 years ago

I get the same exception and I try to solve it. I find this is because I cannot provide correct joda-time "Locale" setup.

        //=========================================
        //          Step 4: Perform Cleaning, Parsing and Aggregation
        //=========================================
        //Let's specify the transforms we want to do
        TransformProcess tp = new TransformProcess.Builder(schema)
            //First: clean up the "replyBytes" column by replacing any non-integer entries with the value 0
            .conditionalReplaceValueTransform("replyBytes",new IntWritable(0), new StringRegexColumnCondition("replyBytes","\\D+"))
            //Second: let's parse the date/time string:
            .stringToTimeTransform("timestamp","dd/MMM/YYYY:HH:mm:ss Z", DateTimeZone.forOffsetHours(-4))

A bad solution is provide JVM level locale setup

                 Locale.setDefault(Locale.US);

Another solution is to customize TransformProcess.java and StringToTimeTransform.java by providing "Locale" setup.

TransformProcess tp = new TransformProcess.Builder(schema)
                // Second: let's parse the date/time string:
                .stringToTimeTransform("timestamp", "dd/MMM/YYYY:HH:mm:ss Z", Locale.US,
                        DateTimeZone.forOffsetHours(-4))

MyStringToTimeTransform.java

public MyStringToTimeTransform(String columnName, String timeFormat, Locale locale, DateTimeZone timeZone,
            Long minValidTime, Long maxValidTime) {
        super(columnName);
        this.timeFormat = timeFormat;
        this.timeZone = timeZone;
        this.minValidTime = minValidTime;
        this.maxValidTime = maxValidTime;
        this.locale = locale;
        this.formatter = DateTimeFormat.forPattern(timeFormat).withLocale(locale).withZone(timeZone);
    }