Closed billcgold closed 5 years ago
Because the Date column is overwritten so many times, maybe you got lost in the transformations? Retracing your steps in the first code chunk (just for the first row), I see:
library(data.table)
# create dummy dt
nasdaq.data <- data.table (Date = seq(as.Date("1/1/1995",format='%m/%d/%Y'), as.Date("12/31/2005",format='%m/%d/%Y'), "days")
, open = runif(n=4018,min=1000,max=2000)
, close = runif(n=4018,min=1000,max=2000))
# simplify
(DT <- head(nasdaq.data, 1))
# Date open close
# 1: 1995-01-01 1928.344 1934.387
# do nothing, copying from OP's first chunk
DT[ , Date2 := as.Date ( as.character (Date), format='%Y-%m-%d' ) ][]
# Date open close Date2
# 1: 1995-01-01 1928.344 1934.387 1995-01-01
# accidentally mal-format, copying from OP's first code chunk
DT[, Date3 := as.Date ( as.character (Date2), format='%m/%d/%y' )][]
# Date open close Date2 Date3
# 1: 1995-01-01 1928.344 1934.387 1995-01-01 <NA>
If there's something else going on, and it's data.table-specific, it might help to provide a clearer example.
Frank thanks for the response. I can simplify the code. The issue occurs when multiple columns (Date. Year, Month, Quarter, n) are all updated in one DT command.
DT [ , `:=` ( Date = as.Date ( as.character (Date), format='%m/%d/%y' )
, Year = year(Date)
, Month = month(Date)
, Quarter = quarter(Date)
, n = 1 ]
As suggested by @franknarf1 here is a clearer and simpler reproducible code sample
library(data.table)
DT <- data.table (date = seq(as.Date("1/1/1995",format='%m/%d/%Y'), as.Date("12/31/2005",format='%m/%d/%Y'), "days"))
# simulate factor as read in from CSV file
DT [ , date := as.Date ( as.character (date), format='%Y-%m-%d' ) ]
# convert factor to date + create other new variables
DT [ , `:=` ( date = as.Date ( as.character (date), format='%m/%d/%y' )
, year = year(date)
) ]
# Date contains <NA>
head(DT)
> head(DT)
date year
1: <NA> 1995
2: <NA> 1995
3: <NA> 1995
4: <NA> 1995
5: <NA> 1995
6: <NA> 1995
reopening, was closed unintentionally
I can reproduce same behaviour in base R
date = seq(as.Date("1/1/1995",format='%m/%d/%Y'), as.Date("12/31/2005",format='%m/%d/%Y'), "days")
date = as.Date ( as.character (date), format='%Y-%m-%d' )
as.Date ( as.character (date), format='%m/%d/%y' )
if NAs are the problem for you then I think it is a because of incorrect use of format
argument
and note there are no factors there, only characters
I agree with Jan. Please post on StackOverflow if there's anything still unclear, I don't think this is a data.table
issue.
Long time data.table user. It is an awesome package, thank you. first time issue post.
The first column (Date) contains values NA
Update with the same data and only the factor -> updated of Date works as expected.
sessionInfo follows
3/21 minor word edits for clarity