acowley / Frames

Data frames for tabular data.
Other
298 stars 41 forks source link

mapMethod for categorical value #38

Open DbIHbKA opened 9 years ago

DbIHbKA commented 9 years ago

When i want scale qualitative value i can use mapMono or mapMethod for do it. What i must use for encode categorical value?

I can not use mapMono or mapMethod because i have to change Filed type but with mapMono or mapMethod i can use only (a -> a) functions.

mapPoly :: (RElem r rs (RIndex r rs), CanDelete r rs) 
               => (forall f. Functor f => (a -> f a) -> Record rs -> f (Record rs))
               -> Record rs 
               -> proxy r 
               -> (a -> b) 
               -> Record (s :-> b ': RDelete r rs)
mapPoly c r p f = frameCons (pure (f (rget c r))) (rdel p r)

I wrote function to solve that problem, but maybe someone knows more beautiful solution?

acowley commented 9 years ago

You've done a good job with this. It's a weak point in the Vinyl story for extensible records. One aspect of the trouble is that we often want to think of the fields of the record as forming a set, but we implement this as a list. If it were a set, then replacing a field amounts to removing the old and adding the new. But as a list, there is this business of ordering that causes some trouble because now the type of the resulting record depends on the order in which you perform these set-like operations!

Another facet of the problem is due to naming. Specifically, the convention of giving the entire record/row type a synonym to make it easier to write. In a case like yours, it might be wise to use a parameterized synonym, like type Row a = Record '["foo" :-> Int, "bar" :-> a], so that you can have Row a and Row b. But, were we to better support this, the ordering problem from earlier would bite us again.

One approach would be to more heavily rely on Vinyl's REquivalent constraint rather than naming the type. I haven't looked into how this works out usability wise.

With respect to your mapPoly, is there any concern about the use of the s type variable there? I don't think there's anything constraining s to be the same for the field being deleted and the one being added.

DbIHbKA commented 9 years ago

I don't think that name ordering is matter but i could be wrong.

Used parameterized synonym is good idea, but when we use Frames, how you know, usually type of Row define tableType.

With respect to mapPoly, there is no concern about the use of the s type variable. I used s because don't know how i can extract name of r and put it in type definition.

I try to use mapPoly and it is very hard to use it, because we must define many constraints hence it is not user friendly

encodeCategorical :: ( RSubset (RDelete DeviceType rs) rs (RImage (RDelete DeviceType rs) rs)
                                  , RElem DeviceType rs (RIndex DeviceType rs)) 
                               => (Text -> U.Vector Double)
                               -> Record rs 
                               -> Record ("deviceType" :-> U.Vector Double ': RDelete DeviceType rs)
encodeCategorical = mapPoly deviceType (Proxy :: Proxy DeviceType)

and i think we must do something like mapMono or mapMethod which use easy.

acowley commented 9 years ago

You can lock down s by adding a constraint r ~ s :-> a, or just using s :-> a everywhere you use r.

I'll try to have a go at mapPoly to see if we can come up with something easier to use.