gaborcsardi / parsedate

R package to parse dates given in arbitrary formats
65 stars 7 forks source link

Parsing the dreaded yymmdd date #3

Closed eddelbuettel closed 10 years ago

eddelbuettel commented 10 years ago

Supplying one of those dreaded North-American formats:

parsedate("070809")
[1] "2014-09-09 11:55:01 UTC"

returns (silently!!) the current date. Can you think of a way to do better?

Useful package idea. I wanted something like this for a while.

gaborcsardi commented 10 years ago

Hmmm, looks like bug in the git parser. Will try to fix soon.

gaborcsardi commented 10 years ago

Although this will be probably parsed as mmddyy, to be consistent with how the git parser does the dashed form:

parse_date("07-08-09")
#> [1] "2009-07-08 UTC"

Do you think yymmdd is used more often than mmddyy?

eddelbuettel commented 10 years ago

I was mostly testing to see ... what you had decided. I think there is no good rule for \d{6} and you just have to pin a decision down. And as much as I hate the format, mmddyy is probably expected. You could add a global option.

gaborcsardi commented 10 years ago

I didn't really make any decisions, apart from using the parser (implemented in C) that seemed most versatile, the one in git. I am surprised it could not parse a simple 070809....

I would avoid options as much as possible, if you have priorities, you can just use lubridate.

eddelbuettel commented 10 years ago

I have no need for lubridate. All (or maybe almost all?) it does I have long done with base R.

As for the 'how should one...' we could look at Perl/Python/Ruby lib, or just pin a convention down.

gaborcsardi commented 10 years ago

Thanks, should be OK now. Six digits are treated as xx-xx-xx, and eight digits as xxxx-xx-xx.

gaborcsardi commented 10 years ago

The eight digit case could be actually smarter, it could detect cases when xx-xx-xxxx would make more sense. I'll think about this.

eddelbuettel commented 10 years ago

The problem there to is that it could be new-world mm-dd-yyyy rather than the old-world dd-mm-yyyy.

eddelbuettel commented 10 years ago

Hm, the "070809" from above now works (yielding 2009-07-08), but I first tried "000102" which gave me today :-/

gaborcsardi commented 10 years ago

Yeah, the git parser does not give errors if the date consists of numbers only, no matter what the numbers are. This is essentially a separate bug in the git parser. I will fix this: #4.

As for mm-dd-yyyy vs dd-mm-yyyy, it works the same way as for xx-xx-xx: it will be mm-dd-yyyy, unless the numbers only make sense for dd-mm-yyyy. This is fine, I think.

eddelbuettel commented 10 years ago

Also, parse_date("010203") gives 2003-01-01 23:00:00 UTC. Either Feb 1 or Jan 2, but Jan 1 seems a little off.

gaborcsardi commented 10 years ago

Yeah, it seems that's a bug related to daylight saving's time. It is off by one hour. #5.