Open travis-leith opened 4 years ago
Post my SO answer here. Does this solve your problem?
To use zip
, both frames will match column by column and add two series together.
In your case, divFrame
has fewer observations than priceFrame
. When two series don't have the same amount of observations and add up together, unmatched result will be missing.
This is my solution by creating a dummy frame so that divFrame
are aligned with priceFrame
first.
let divFrame2 =
let dummy =
priceFrame.RowKeys
|> Seq.collect(fun row -> divFrame.ColumnKeys |> Seq.map(fun col -> row, col, 0) )
|> Frame.ofValues
(dummy + divFrame).FillMissing(0.)
priceFrame + divFrame2
@zyzhu this does solve my immediate problem, thank you. But with a view to the longer term, should Deedle provide a more convenient "native" solution to cases like this?
Seems like zip would be an appropriate place for something like that to happen. Perhaps something like my attempt 3 or 4 should result in the desired outcome. What do you think?
That's fair. A different solution. addFunc
is to deal with missing values in s2
let addFunc (s1:Series<DateTime,float>) s2 =
s1.Zip(s2, JoinKind.Left, Lookup.Exact) |> Series.mapValues(fun (x, y) -> x.ValueOrDefault + y.ValueOrDefault)
Frame.zipAlign JoinKind.Left JoinKind.Left Lookup.Exact addFunc priceFrame divFrame
@zyzhu I am testing the proposed solutions. Your second solution does not work, it simply returns the price frame unaltered. This is a shame because it is the simplest. Incidentally, it looks similar to my attempt 3 and returns the same (incorrect) result. Are you sure this isn't a bug?
I have done some performance testing.
Required module extension
module Frame =
let mapReplaceCol col f frame =
frame
|> Frame.replaceCol col (Frame.mapRowValues f frame)
let frocha1() =
let priceFrame' = priceFrame |> Frame.mapColKeys string
let dividends' =
divFrame
|> Frame.mapColKeys (string >> (+) "D")
let joinedFrame =
priceFrame'
|> Frame.join JoinKind.Right dividends'
|> Frame.fillMissingWith 0.
(joinedFrame,priceFrame'.ColumnKeys |> List.ofSeq)
||> List.fold (fun acc elem ->
acc|> Frame.mapReplaceCol elem (fun row ->
row.GetAs<float>("D" + elem) + row.GetAs<float>(elem))
|> Frame.dropCol ("D" + elem))
let frocha2() =
let dividends2 =
(priceFrame,priceFrame.ColumnKeys |> List.ofSeq)
||> List.fold (fun acc elem -> acc|> Frame.dropCol elem) //empty frame
|> Frame.join JoinKind.Outer divFrame
|> Frame.fillMissingWith 0.
(priceFrame,dividends2) ||> Frame.zip (fun (p : float) (d : float) -> p + d)
let zyzhu1() =
let divFrame2 =
let dummy =
priceFrame.RowKeys
|> Seq.collect(fun row -> divFrame.ColumnKeys |> Seq.map(fun col -> row, col, 0) )
|> Frame.ofValues
(dummy + divFrame).FillMissing(0.)
priceFrame + divFrame2
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
[for i in 1..1000 do
do frocha1() |> ignore
] |> ignore
stopWatch.Stop()
printfn "frocha1: %f" stopWatch.Elapsed.TotalMilliseconds
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
[for i in 1..1000 do
do frocha2() |> ignore
] |> ignore
stopWatch.Stop()
printfn "frocha2: %f" stopWatch.Elapsed.TotalMilliseconds
let stopWatch = System.Diagnostics.Stopwatch.StartNew()
[for i in 1..1000 do
do zyzhu1() |> ignore
] |> ignore
stopWatch.Stop()
printfn "zyzhu1: %f" stopWatch.Elapsed.TotalMilliseconds
Outputs
frocha1: 541.288300
frocha2: 521.625100
zyzhu1: 976.112500
frocha2
is consistently the fastest. However, I am hoping we can get zyzhu2
(or some variation of it) to work since it is the simplest and seems like a natural solution.
I agree, the zip function does not work intuitively well. And my simple example doesn't work either. A simple basic operation has to be done in several iterations. I solved this by creating additional frames and using fillMissing values. But this is all very difficult, especially for beginners and long evaluated.
It seems that the logic of the implementation of zip function needs to be adjusted.
Here is my basic example code (written in C #):
var f1 = Frame.FromValues(new[] {
Tuple.Create(1, "Name", "One"),
Tuple.Create(2, "Name", "Two") });
var f2 = Frame.FromValues(new[] {
Tuple.Create(2, "Name", "Two"),
Tuple.Create(3, "Name", "Three") });
f1.Zip<string, string, string>(f2, (x1, x2) => (x1 == null) ? x2 : x1).Print()
And result is:
f1 = Name
1 -> One
2 -> Two
f2 = Name
2 -> Two_broken
3 -> Three
And zip result is
Name
1 -> <missing>
2 -> Two
3 -> <missing>
But desired result is
Name
1 -> One
2 -> Two
3 -> Three
Sorry for proposing the second solution that did not solve the problem. I agree that zip
is confusing to use. Let me take some time to look into the design and see what I can do about it.
Ping @tpetricek to see whether he has any ideas.
Thanks a lot.
I think the simplest solution here is as follows. Make one more override of zip function. And extend the lambda-function to take OptionalValue arguments. And user in that lambda-function will decide on his oun how to handle missed values (missed values take place where indices do not overlap). And zip function will return Frame with OptionalValue or Object - depends on type that lambda-function will return.
And it will be useful to make a similar option for series objects too (for them, in the zip function, user cannot specify a lambda-function as parametr at all)
I posted a question related to this on SO.
I have 2 frames and I want to add them together. I want all the values on the left frame to be added to the values on the right frame when available, but in case of missing values on the right frame, I want to return the left value unchanged.
In my case, the left frame is called
priceFrame
containing a bunch of prices. The right frame is calleddivFrame
containing a bunch of dividends.Attempts 1 and 2
both of these have the same result; the correct values are given for price + dividend but in cases where there are no dividends,
<missing>
is returned.Attempt 3
The result is exactly the same as
priceFrame
unchanged.Attempt 4
The result is nothing but
<missing>
.I feel like the design is intended to make my desired outcome easy to achieve. If so, is this a bug or just a case of me not using it correctly?