HertieDataScience / SyllabusAndLectures

Hertie School of Governance Introduction to Collaborative Social Science Data Analysis
MIT License
37 stars 60 forks source link

Issue with converting date formats for entire column #73

Closed laurencehendry closed 8 years ago

laurencehendry commented 8 years ago

Hi all,

Firstly, thanks to @christophergandrud @ChristopherCosler for advice on merging the datasets (it worked really well, and the variables have now also been renamed). Unfortunately we are now having no luck reformatting the dates column for our observations (the dates were entered with the 'days since 1900 (or 1904)' microsoft excel convention). The offending code occurs at lines 79 and 80: https://github.com/laurencehendry/SRISK_Thesis/blob/master/Gather

We can successfully convert one value but haven't succeeded in applying this to the whole column.

Grateful for consideration, Laurence

LarsMehwald commented 8 years ago

Dear Laurence, Did you try out the command lapply? I have had a similar question and it worked for me. Bests, Lars

laurencehendry commented 8 years ago

Hi Lars,

http://datapub.cdlib.org/2014/04/10/abandon-all-hope-ye-who-enter-dates-in-excel/ site explains my general exasperation with it, that has led me on to a few excel-R integration packages but to no avail...

https://cran.r-project.org/web/packages/timeDate/timeDate.pdf timeDate() package looks powerful and comprehensive, but doesn't seem to have functionality for what I need...

Checked out your R scripts, found the lapply(Packages, require, character.only = TRUE) example but not sure how the syntax works. Hopefully can discuss sometime! :-)

Best, L

christophergandrud commented 8 years ago

Do you have the original Excel file? If so you might be able to recategorised the dates to, strings for example, before exporting to CSV, thereby side stepping the whole problem.

laurencehendry commented 8 years ago

Hi Christopher,

We thought of this, but the original excel files are according to firm (so 414 files). I have been looking into macros. The merged .csv is also 1.1m+ rows so we can't do it in one... On 17/11/2015 7:10 pm, "Christopher Gandrud" notifications@github.com wrote:

Do you have the original Excel file? If so you might be able to recategorised the dates to, strings for example, before exporting to CSV, thereby side stepping the whole problem.

— Reply to this email directly or view it on GitHub https://github.com/HertieDataScience/SyllabusAndLectures/issues/73#issuecomment-157456467 .

christophergandrud commented 8 years ago

Check out the as.Date function in base R. It has an origin argument.

The help file actually has a discussion of the Excel issue: http://www.inside-r.org/r-doc/base/as.Date

laurencehendry commented 8 years ago

Hi Chris,

That was actually one of the first commands I tried, but I only got this far:

as.Date(36690, origin = "1900-01-01") [1] "2000-06-15"

The output is correct, but I then had trouble creating an object, or better yet to format the existing corresponding column, for all the existing day values.

christophergandrud commented 8 years ago

Imagine that you you have a data frame called DATA and your date variable in it is called date, then run:

DATA$new_date <- as.Date(DATA$date, origin = '1900-01-01')

The new variable new_date should contain the corrected dates.

laurencehendry commented 8 years ago

thank you, this worked

christophergandrud commented 8 years ago

Great. Very useful question btw.