Closed adamklein closed 10 years ago
What do you think of the primitive VectorConstruction type tag
/// Combine N aligned vectors. The `IVectorValueTransform` object
/// specifies how to merge values (in case there is a value at a given address
/// in more than one of the vectors).
| CombineN of VectorConstruction list * IVectorValueTransform
With the implementation:
member builder.Build<'T>(command:VectorConstruction, arguments:IVector<'T>[]) =
...
| CombineN(vectors, op) ->
let data =
vectors
|> List.map (fun v -> builder.buildArrayVector v arguments)
|> List.map (function AsVectorOptional o -> o)
let merge = op.GetFunction<'T>()
let filled = Array.init (data |> List.map (fun v -> v.Length) |> List.reduce max) (fun idx ->
data
|> List.map (fun v -> if idx > v.Length then OptionalValue.Missing else v.[idx])
|> List.reduce merge)
vectorBuilder.CreateMissing(filled)
I'm sketching out an appendN that looks like:
let appendN (frames: Frame<'R, 'C> seq) =
let vectorBuilder = Vectors.ArrayVector.ArrayVectorBuilder.Instance
let indexBuilder = Indices.Linear.LinearIndexBuilder.Instance
let rowUnion ix (f: Frame<'R, 'C>) =
let newIx, _, _ = indexBuilder.Union((ix, VectorConstruction.Empty), (f.RowIndex, VectorConstruction.Empty))
newIx
let allRowIx = frames |> Seq.fold rowUnion (indexBuilder.Create([], Some(true)))
let ixbuildr (ixs: Indices.IIndex<_> seq) =
ixs
|> Seq.mapi (fun i x -> indexBuilder.Reindex(x, allRowIx, Lookup.Exact, Vectors.Return(i), fun _ -> true))
|> fun vc -> Vectors.CombineN(Seq.toList vc, VectorValueTransform.LeftOrRight)
vectorBuilder.Build(frames |> Seq.map (fun f -> f.RowIndex) |> ixbuildr,
frames |> Seq.map (fun f -> f.GetSeriesAt(0).Vector) |> Seq.toArray)
This does a appendN on the first column of each frame (regardless of column label). What's left to do is align the columns.
Thoughts on this approach to reducing data copying?
(Aside, this is as much or more of an exercise in understanding the internals of deedle as it is fulfilling some requirement ;)
I think that adding an operation like CombineN
(or perhaps replacing the existing one with a more general one that takes a list rather than just two things) is a great idea.
I'll have a better look at the rest of the code tomorrow (just got back home after 2 weeks of traveling & talking...). Or if you can put it in some branch, that would be even easier?
I'll try to get it to a point to check in -
I don't see any reason we can't replace Combine with CombineN. Open to other suggestions for cleaner code.
I'm not sure I got the code 100% when it comes to IVector <-> IVector<'T> handling, which is still a bit beyond my depth. It seems everything is converted to obj. Is there any way to avoid, or not really?
Ok, I think I've got the IVector <-> IVector<'T> handling, although it made the code significantly more complex. @tpetricek Would appreciate your eyes on this :)
Wondering whether it's possible to implement something like
which doesn't take O(N^2) time; ie, currently, if you do a reduction on a sequence of frames pairwise via Frame.append, it will copy the data over and over again. To do it in one shot, it seems straightforward until you have to express the construction via a VectorConstruction. I imagine a CustomCommand, eg
could do it, building up from more basic VectorConstruction objects. I'm having trouble envisioning how it might be done. Or, perhaps it's the wrong approach altogether?