Closed milktrader closed 11 years ago
I think the macro you're using is probably a little broken for sophisticated types:
df = DataFrame(quote
time = now()
end)
I think you might potentially have success if you did:
df = DataFrame(quote
time = DataVec([now()])
end)
But we still need to change DataVec's to support ad hoc types. I'll make an RFC tonight that does away with the current naRule
, replaceVal
architecture. That will make it trivial to create ad hoc DataVec
's.
This should work now. Calendar.jl seems temporarily broken, but the machinery for this is now in place. Please re-open if it fails on your end.
Awesome, thanks!
I'm putting in Array{UTF16String} into my "Date" column and it's working in REPL.
This works:
julia> a = csvread("data/AAPL.csv")
7117x7 Any Array:
"Date" "Open" "High" "Low" "Close" "Volume" "Adj Close"
"2012-11-23" 567.17 572.0 562.6 571.5 9.7438e6 571.5
"2012-11-21" 564.25 567.37 556.6 561.7 1.33215e7 561.7
julia> b = convert(Array{UTF16String}, a[2:end,1])
7116-element UTF16String Array:
"2012-11-23"
"2012-11-21"
"2012-11-20"
julia> c = map(x -> parse("yyyy-MM-dd", x), b)
7116-element CalendarTime Array:
Nov 23, 2012 12:00:00 AM EST
Nov 21, 2012 12:00:00 AM EST
julia> df = DataFrame(quote
Date = $(c)
end);
julia> head(df,2)
2x1 DataFrame:
Date
[1,] Nov 23, 2012 12:00:00 AM EST
[2,] Nov 21, 2012 12:00:00 AM EST
julia> typeof(df[1])
DataVec{CalendarTime}
That is just outstanding.
On Sat, Dec 15, 2012 at 1:29 AM, milktrader notifications@github.comwrote:
This works:
julia> a = csvread("data/AAPL.csv")7117x7 Any Array:"Date" "Open" "High" "Low" "Close" "Volume" "Adj Close" "2012-11-23" 567.17 572.0 562.6 571.5 9.7438e6 571.5 "2012-11-21" 564.25 567.37 556.6 561.7 1.33215e7 561.7 julia> b = convert(Array{UTF16String}, a[2:end,1])7116-element UTF16String Array: "2012-11-23" "2012-11-21" "2012-11-20" julia> c = map(x -> parse("yyyy-MM-dd", x), c)7116-element CalendarTime Array: Nov 23, 2012 12:00:00 AM EST Nov 21, 2012 12:00:00 AM EST julia> df = DataFrame(quote Date = $(c) end); julia> head(df,2)2x1 DataFrame: Date[1,] Nov 23, 2012 12:00:00 AM EST[2,] Nov 21, 2012 12:00:00 AM EST julia> typeof(df[1])DataVec{CalendarTime}
— Reply to this email directly or view it on GitHubhttps://github.com/HarlanH/DataFrames.jl/issues/98#issuecomment-11400592.
Does read_table
not work for your file? I'm always trying to find things that break the reader, so please let me know if it doesn't.
julia> foo = read_table("data/AAPL.csv");
julia> head(foo, 2)
2x7 DataFrame:
Date Open High Low Close Volume Adj Close
[1,] "2012-11-23" 567.17 572.0 562.6 571.5 9743800 571.5
[2,] "2012-11-21" 564.25 567.37 556.6 561.7 13321500 561.7
julia> typeof(foo[:,1])
DataVec{ASCIIString}
julia> typeof(foo[:,2])
DataVec{Float64}
julia> map(x -> parse("yyyy-MM-dd", x), foo[:,1])
no method map(Function,DataVec{ASCIIString})
julia> bar = map(x -> parse("yyyy-MM-dd", x), convert(Array{UTF16String}, foo[:,1]))
7116-element CalendarTime Array:
Nov 23, 2012 12:00:00 AM EST
Nov 21, 2012 12:00:00 AM EST
I should mention that the following requires DataFrames
(duh) and UTF16
.
This is really nice to be able to read it in this way. I now can throw away my functions that were doing this the hard way. I'm not sure how to convert ASCIIString
from the original foo
DataFrame into my CalendarTime
type in-place though.
Ah, we're missing map for DataVec's. I'll add that now. For now you could use:
DataVec(map(x -> parse("yyyy-MM-dd", x), vector(foo[:, 1])))
The use of vector
to convert to Vector{ASCIIString}
works because you (hopefully) don't have missing data. We need to do more on converting types in place within a DataVec. For time types we're going to need the API to settle down a bit before we'll have reliable support. My end goal is to have this just be:
foo[:, 1] = dvdatetime(foo[:, 1], "yyyy-MM-dd")
That's great.
The read_table
is a very elegant replacement for my hard-coded method:
stock = DataFrame(quote
Date = $(cal[:])
Op = $(sym[:,1])
Hi = $(sym[:,2])
Lo = $(sym[:,3])
Cl = $(sym[:,4])
Vol = $(sym[:,5])
Adj = $(sym[:,6])
end);
Which was obviously a hack to see if things work, but now I'm happy to say goodbye. Not all financial data comes in the Yahoo format (shocking I know).
Glad you like it. read_table
is still going to need a few iterations to make it faster and more powerful.
I've just pushed an update to DataFrames that does a basic map
. For your example you'll still need to do something extra, because the map
produces Any
's.
FYI: This update changes read_table
to use UTF8 instead of ASCII by default.
Nice. Yeah I see map
in my future. Calculating returns on a Close column for instance.
The CalendarTime type is defined as:
It's fairly new and hasn't made it to the package distribution system yet, but it's a clean and simple implementation.
The goal is to have a column in a
DataFrame
be a time type butDataFrames
needs anInt64
whereCalendarTime
is aFloat64
.Why does
DataFrames
want to convertCalendarTime
toInt64
?This is further puzzling because
DataFrames
does accept aFloat64
.