Open cfhammill opened 5 years ago
Can you provide an example with a record before and after unnest
ing? I'd like to make sure we're on the same page. It seems sort of melt
-like, so perhaps there are some techniques there we can learn from. That would be a very helpful exercise actually since melt
could use some attention as it's still using the old Proxy
-style approach to type applications and should be updated.
Any example transformation you can write will become a test case for when we figure this out, so it's worth picking something pithy but representative.
Absolutely, I'll write a handful this afternoon, thanks!
On Mon, Nov 19, 2018, 11:18 AM Anthony Cowley <notifications@github.com wrote:
Can you provide an example with a record before and after unnesting? I'd like to make sure we're on the same page. It seems sort of melt-like, so perhaps there are some techniques there we can learn from. That would be a very helpful exercise actually since melt could use some attention as it's still using the old Proxy-style approach to type applications and should be updated.
Any example transformation you can write will become a test case for when we figure this out, so it's worth picking something pithy but representative.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/120#issuecomment-439950774, or mute the thread https://github.com/notifications/unsubscribe-auth/AHHwHnVRWIpOpRli43SjjIdKiA_klVn3ks5uwtnYgaJpZM4YpX5d .
Ok so first attempt:
tr :: Rec Identity '[Int, [Int], Int]
tr = 5 :& Identity [6,7,8] :& 3 :& RNil
utr :: [Rec Identity '[Int,Int,Int]]
utr = unnest1 @[] @Int @'[Int] @'[Int] tr
This does the right thing
[ {Identity 5, Identity 6, Identity 5}
, {Identity 5, Identity 7, Identity 5}
, {Identity 5, Identity 8, Identity 5}
]
Irritating that I have to specify the functor, the ls, and the rs explicitly via TypeApplication, but the core idea is there. Ideally I'd like to switch the order of c
and a
in the forall. and just pass the a
. Probably can fix this with some RDelete annotations.
For the multi version I'd like
tr2 :: Rec Identity '[Int, [Int], [Int]]
tr2 = 5 :& Identity [6,7,8] :& [3,4,5] :& RNil
utr2 :: [Rec Identity '[Int,Int,Int]]
utr2 = unnest @'[Int,Int] tr2
to return
[ {Identity 5, Identity 6, Identity 3}
, {Identity 5, Identity 7, Identity 4}
, {Identity 5, Identity 8, Identity 5}
]
Want me to open a PR so we can coordinate that way?
That last example seems questionable to me. I thought dplyr
would do something like a Cartesian product, so you’d have 5,6,3
, 5,7,3
, 5,8,3
, 5,6,4
, 5,7,4
, etc.
That seems more compositional since you can unnest one column at a time until there are no more list-like columns.
Sorry for the close and reopen, keyboard got away from me.
You can do the cartesian version really simply I think. Off the top of my head you can split out each list column with the non-list columns, unnest1 and then inner join on the non-list columns.
I think the zipping case is generally useful too though. Say you have have a handful of functions you want to apply to a small data frame. Each row represents arguments. You can apply each function, appending a field with the list of functions and a field with the list of results. Then you want to turn each row,function,result triple into a new row.
I do this very frequently when working in R. I'm open to being convinced it is the wrong way, but I doubt it.
The counter-argument I guess is that if you know in advance you can just tuple the stuff you care about and unnest1
So there are two ways I can think to move forward:
Accept cartesian producting: in which case unnest
just becomes type-level recursive unnest1. If we did this, we should figure out how to make the type inference work without manually annotating the full record spec.
Keep working on the zip version, in which case we need to figure out why MapTyCon
and AllConstrained
are insufficient to indicate that each element is a list type.
Both! (my preference). In some senses the two strategies are product and sum unnesting.
Do you have any ideas on why the solver is unhappy in the type signature for unnest above?
I got it, and did both. The key thing to note is that the difference between the two unnestings comes down to a choice of a pure
and liftA2
. For cartesian product, they are pure
and liftA2
, for zip it's repeat
and zipWith
.
I had to create a little machinery that probably belongs in Vinyl to get this to work:
I created a gist: https://gist.github.com/cfhammill/a68e175b2e5ae9db76a89776cfcf158e
I think from here it will be trivial to extend to records with mixed fields, and then simplify the API for working with Frames.
@acowley do you want me to PR the vinyl stuff against your vinyl fork?
Yes, let's try to get these into Vinyl. I think there are a bunch of uses of slightly specialized versions of these scattered around Vinyl and Frames, and it would be nice to eventually rationalize all that.
Please do open a PR on the main repo.
I've been tearing my hair out on this one over the past couple days. I'm trying to write a Frames equivalent of dplyr unnest. This is for the case when you have a known set of record fields that are lists and you want to convert them back into rows, duplicating the non-list elements as necessary.
I first started with unnesting 1 field, which was straight forward enough:
But I started running into trouble with unnesting many columns, I think the code is essentially right, but I can't tell the constraint solver that all columns in the zip-fold are of the right type. I'll start with the code
and the ugly and not working type:
So the heart of the type-level programming here is translating
as
which is a list of types which are lists toas'
which are a list of the unwrapped types. I was hoping either declaring thatas
is equal to mapping the[]
constructor overas'
, or saying that that allas
are constrained to be lists would be enough to get it to type check, but no dice. I've also tried usingreifyConstraint
but I haven't been able to get that to work.Any help would be greatly appreciated.
Also I'm not attached to anything yet. I would rather length 1 lists get repeated rather than all lists getting zipped down to length 1, but first things getting it to type-check.