GenericMappingTools / gmt

The Generic Mapping Tools
https://www.generic-mapping-tools.org
Other
858 stars 359 forks source link

Better way to handle NaT (not-a-time) #3414

Closed seisman closed 4 years ago

seisman commented 4 years ago

Description of the problem

Both Python numpy and Matlab can have a datetime called NaT, i.e., not-a-time. It's similar to NaN but for time. Not sure if Julia also has something similar.

GMT can't handle NaT. It's OK for command line, but not for external interfaces like PyGMT.

Currently, when a NaT string is passed to GMT, it reports an error and keeps running.

Unable to parse 1 datetime strings (ISO datetime format required)

We should at least change the error to warning.

What's worse is: https://github.com/GenericMappingTools/gmt/blob/3003c5020e6615d38ceaac8bf4a8bdfa44cdfcaf/src/gmt_api.c#L12769-L12771 GMT converts the datetime strings as double values relative to the epoch time internally. In the codes above, if dt[row]="NaT", the function gmt_scanf() returns GMT_IS_NAN and the variable t_vector[row] is untouched. The value of t_vector[row] is 0, which means 1970-01-01T12:00:00. The plot is incorrect if we're plotting a calendar plot around 1970-01-01.

Ping @PaulWessel, @joa-quim and @weiji14.

PaulWessel commented 4 years ago

So we need to change it to

 if (gmt_scanf (API->GMT, dt[row], GMT_IS_ABSTIME, &(t_vector[row])) == GMT_IS_NAN) {
     n_bad++; 
    t_vector[row] = API->GMT->session.d_NaN;
}

Pls try that. Alternatively, if you wish "NaT" string to be a valid entry that yields NaN, then a separate branch is needed:

if (strcmp (dt[row], "NaT") == 0)
       t_vector[row] = API->GMT->session.d_NaN;
else if (gmt_scanf ....
}
joa-quim commented 4 years ago

First option seem better. It will be visited only in case of failure while second introduces another if branch.

seisman commented 4 years ago

In the first option, NaT is converted to NaN, but we still see the warnings (n_bad++). I personally prefer the second option. Waiting for @weiji14's comments on this.

weiji14 commented 4 years ago

Struggling with the C code here, no idea what the fuss of another if branch is about :laughing:. I say go with the first option? It's okay to have warnings printed out (as long as they're at the info/warning level, not error). I highly doubt that anyone actually stores NaT values literally as NaT in a text file, so we shouldn't have to check for them too literally.

seisman commented 4 years ago

OK. Go with the first option in PR #3415.

joa-quim commented 4 years ago

no idea what the fuss of another if branch is about

It's about branch prediction. For example loops with if branches cannot be vectorized. See first answer of https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array

weiji14 commented 4 years ago

Thanks for pointing me to that link (never saw such a highly upvoted stackoverflow question in my life!), was very educational. Definitely should go for the vectorizable option here, it sounds a bit like the EAFP style in Python, or Easier to Ask for Forgiveness than Permission, in contrast with LBYL or Look Before You Leap.