acowley / Frames

Data frames for tabular data.
Other
298 stars 41 forks source link

How can you manually build a Record/'Row'? #73

Open codygman opened 7 years ago

codygman commented 7 years ago
type Row = Record ["col_a" :-> Text]

-- I figured out how to make something close
buildRow :: Record '[ColA]
buildRow = pure (Col "some text") :& Nil

How can I make a buildRow function of type Row? Perhaps with ColFun? Perhaps something totally different? Something is telling me I could do this by distributing Identity across the record with ColFun.

acowley commented 7 years ago

In the context of the MissingData demo,

let r = 2 &: "hey" &: Nil :: Row
codygman commented 7 years ago

What would the type be if I wanted to make a new row from those two that appends them together resulting in Record '[CombinedText], meaning the value would be 2hey.

My use case is that I have a Record and then needed to combine some fields into a field that exists in a different Record I've defined with tabletypes.

On Dec 14, 2016 2:38 PM, "Anthony Cowley" notifications@github.com wrote:

In the context of the MissingData demo,

let r = 2 &: "hey" &: Nil :: Row

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/73#issuecomment-267149593, or mute the thread https://github.com/notifications/unsubscribe-auth/AANyN_gm_HYVTLnzqkEy6uGzKWeViv9kks5rIFO8gaJpZM4LNU6N .

acowley commented 7 years ago

It all depends on what you want it to do. The most basic thing is that you would create a Record '[Text]: there's no way to guess what you might want to label the combined column. If you do want a labeled column (using the Col type as usual), you would need an annotation somewhere to drive the type inference. But since we're down to a single field here, I feel like maybe we're too far from your actual needs.

Focusing on the last thing you asked, I would expect that to be some lensy thing, otherRecord & colCombined .~ (T.pack (show (row^.colA)) <> row^.colB) (assuming colCombined is the lens for the destination column in the second record).

codygman commented 7 years ago

A more detailed example is as follows

3 data sources, 1 normalized and 2 denormalized.

normalized

compositeKey
"123"
"456"

valid denormalized

col_1a,col_2a,col_3a
1,2,3
4,5,6

invalid denormalized

col_1a,col_2a,col_10a
1,2,3
4,5,6

Generated types (respectively):

type Normalized =
  Record '["composite_key" :-> Text, ]

type ValidDenormalized =
  Record '["col_1" :-> Int, "col_2" :-> Int, "col_3" :-> Int]

type InvalidDenormalized =
  Record '["col_1" :-> Int, "col_2" :-> Int, "col_5555" :-> Int]

I'd like a function that will create the composite key but has the constraints that the input record must contain col_1, col_2, and col_3 and that the output must contain composite_key.

According to the above I should be able to:

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) =>
    Record rs -> Record rs2
createCompositeKey r = compositeTxt &: Nil
    where compositeTxt = (T.pack . show . view col1 $ r) <>
                                        (T.pack . show . view col2 $ r) <>
                                        (T.pack . show . view col3 $ r)

The point being that createCompositeKey would give a compile time error if I tried to use it with InvalidDenormalized.

Here is my full attempt and self-contained reproduction:

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE DataKinds #-}
module Test where

import Frames
import Data.Vinyl.TypeLevel (RIndex)
import Data.Typeable (Typeable, showsTypeRep, typeRep)
import Data.Proxy

type Normalized =
  Record '["composite_key" :-> Text ]

type ValidDenormalized =
  Record '["col_1" :-> Int, "col_2" :-> Int, "col_3" :-> Int]

type InvalidDenormalized =
  Record '["col_1" :-> Int, "col_2" :-> Int, "col_5555" :-> Int]

-- createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) =>
--     Record rs -> Record rs2
-- createCompositeKey r = (compositeTxt :: Text) &: Nil
--     where compositeTxt = (T.pack . show . view col1 $ r) <>
--                                         (T.pack . show . view col2 $ r) <>
--                                         (T.pack . show . view col3 $ r)

{- error:
src/Frames/Test.hs:37:24: Couldn't match type ‘rs2’ with ‘'[s0 :-> Text]’ …
      ‘rs2’ is a rigid type variable bound by
            the type signature for
              createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs,
                                     CompositeKey ∈ rs2) =>
                                    Record rs -> Record rs2
            at /home/cody/source/Frames/src/Frames/Test.hs:35:23
    Expected type: Record rs2
      Actual type: Record '[s0 :-> Text]
    Relevant bindings include
      createCompositeKey :: Record rs -> Record rs2
        (bound at /home/cody/source/Frames/src/Frames/Test.hs:37:1)
    In the expression: (compositeTxt :: Text) &: Nil
    In an equation for ‘createCompositeKey’:
        createCompositeKey r
          = (compositeTxt :: Text) &: Nil
          where
              compositeTxt
                = (T.pack . show . view col1 $ r)
                  <>
                    (T.pack . show . view col2 $ r) <> (T.pack . show . view col3 $ r)
Compilation failed.
-}
-- definitions for Type synomyns and lenses adapted from end of tutorial
type CompositeKey = "composite_key" :-> Int
type Col1 = "col_1" :-> Int
type Col2 = "col_2" :-> Int
type Col3 = "col_3" :-> Int
type Col5555 = "col_5555" :-> Int

compositeKey ::
  forall f_adkB rs_adkC. (Functor f_adkB,
                           RElem CompositeKey rs_adkC (RIndex CompositeKey rs_adkC)) =>
  (Int -> f_adkB Int) -> Record rs_adkC -> f_adkB (Record rs_adkC)
compositeKey = Frames.rlens (Proxy :: Proxy CompositeKey)

col1 ::
  forall f_adkB rs_adkC. (Functor f_adkB,
                           RElem Col1 rs_adkC (RIndex Col1 rs_adkC)) =>
  (Int -> f_adkB Int) -> Record rs_adkC -> f_adkB (Record rs_adkC)
col1 = Frames.rlens (Proxy :: Proxy Col1)

col2 ::
  forall f_adkB rs_adkC. (Functor f_adkB,
                           RElem Col2 rs_adkC (RIndex Col2 rs_adkC)) =>
  (Int -> f_adkB Int) -> Record rs_adkC -> f_adkB (Record rs_adkC)
col2 = Frames.rlens (Proxy :: Proxy Col2)

col3 ::
  forall f_adkB rs_adkC. (Functor f_adkB,
                           RElem Col3 rs_adkC (RIndex Col3 rs_adkC)) =>
  (Int -> f_adkB Int) -> Record rs_adkC -> f_adkB (Record rs_adkC)
col3 = Frames.rlens (Proxy :: Proxy Col3)

col5555 ::
  forall f_adkB rs_adkC. (Functor f_adkB,
                           RElem Col5555 rs_adkC (RIndex Col5555 rs_adkC)) =>
  (Int -> f_adkB Int) -> Record rs_adkC -> f_adkB (Record rs_adkC)
col5555 = Frames.rlens (Proxy :: Proxy Col5555)
codygman commented 7 years ago

I'll probably later put all of those composite keys in a map and then filter various producers based on it to find differences between normalized and denormalized data.

acowley commented 7 years ago

I think I see now, thanks. The issue is with this type,

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Record rs2

The problem is that _ &: Nil can only have a type Record '[t] for some t compatible with the value you give, and there is nothing to say that rs2 ~ '[t]. What you need is a fully realized value of type Record rs2. I think the best way is to take one as an argument,

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Record rs2 -> Record rs2
createCompositeKey = compositeKey .~ _

(I left a hole there for the text appending business of combining the elements of rs you want.)

But you could also gin up a new Rec Maybe rs2 or something, set that value's compositeKey to Just the value you want, and return that for the caller to merge with other data down the line. The point in taking it as an argument to createCompositeKey is that the caller can decide if Rec Maybe rs2 makes sense, or if you already have a Record rs2 you want to modify, or if you want a different functor, e.g. Rec First rs2.

codygman commented 7 years ago

What you need is a fully realized value of type Record rs2. I think the best way is to take one as an argument,

This is starting to sound like a chicken and egg problem ;)

Where will the full realized rs2 come from? How do I create a fully realized rs2? It's not something I'm getting from an external source, but something I'm creating.

On Dec 14, 2016 6:31 PM, "Anthony Cowley" notifications@github.com wrote:

I think I see now, thanks. The issue is with this type,

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Record rs2

The problem is that _ &: Nil can only have a type Record '[t] for some t compatible with the value you give, and there is nothing to say that rs2 ~ '[t]. What you need is a fully realized value of type Record rs2. I think the best way is to take one as an argument,

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Record rs2 -> Record rs2 createCompositeKey = compositeKey .~ _

(I left a hole there for the text appending business of combining the elements of rs you want.)

But you could also gin up a new Rec Maybe rs2 or something, set that value's compositeKey to Just the value you want, and return that for the caller to merge with other data down the line. The point in taking it as an argument to createCompositeKey is that the caller can decide if Rec Maybe rs2 makes sense, or if you already have a Record rs2 you want to modify, or if you want a different functor, e.g. Rec First rs2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/73#issuecomment-267200577, or mute the thread https://github.com/notifications/unsubscribe-auth/AANyNzHnObTyIm8rIy4izqzxT9vosdUiks5rIIpUgaJpZM4LNU6N .

acowley commented 7 years ago

You can always make a Rec Maybe rs2 as I suggested. I think mempty would do so.

What's happening is that you seem to want to incrementally construct the record, which means you would need to be able to talk about partially constructed records, which is one way of interpreting a Rec Maybe. The alternative is to compute all your fields individually and build your record using &: to either create your desired record in one shot or to incrementally build up to your rs2 set of fields one field at a time, bearing in mind that those intermediate records will not have a set of fields equal to rs2.

codygman commented 7 years ago

Thanks, I'll try out the recMaybe approach as you suggested. I also thank you for deepening my understanding.

On Dec 14, 2016 8:23 PM, "Anthony Cowley" notifications@github.com wrote:

You can always make a Rec Maybe rs2 as I suggested. I think mempty would do so.

What's happening is that you seem to want to incrementally construct the record, which means you would need to be able to talk about partially constructed records, which is one way of interpreting a Rec Maybe. The alternative is to compute all your fields individually and build your record using &: to either create your desired record in one shot or to incrementally build up to your rs2 set of fields one field at a time, bearing in mind that those intermediate records will not have a set of fields equal to rs2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/73#issuecomment-267218956, or mute the thread https://github.com/notifications/unsubscribe-auth/AANyN0OmATu-gOKn5VaR2-cMBhvh7siLks5rIKSTgaJpZM4LNU6N .

codygman commented 7 years ago

Just to clarify, createCompositeKey' :: forall rs rs2. (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Rec Maybe rs2 should be possible then?

If so, I'm confused at why this still has an issue:

mkMaybeCompositeKey :: (CompositeKey ∈ rs) => Rec Maybe rs
mkMaybeCompositeKey = Just "compositekey" :& Nil

Then I read your comment about mempty working and tried this:

mkMaybeCompositeKey :: (CompositeKey ∈ rs) => Rec Maybe rs
mkMaybeCompositeKey = mempty

I'm clearly not applying it right. Before I submit this I'll re-read what you wrote above to make sure I'm not wasting your time. After re-reading I tried a lens based approach:

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Rec Maybe rs2
createCompositeKey r = r & compositeKey ?~ (Just compositeTxt)
    where compositeTxt = (T.pack . show . view col1 $ r) <>
                         (T.pack . show . view col2 $ r) <>
                         (T.pack . show . view col3 $ r)
-- src/Frames/Test.hs:70:28: Couldn't match type ‘Data.Vinyl.Functor.Identity’ with ‘Maybe’ …
--     Expected type: ASetter
--                      (Record rs) (Rec Maybe rs2) Text (Maybe (Maybe Text))
--       Actual type: (Text -> Identity Text)
--                    -> Record rs2 -> Identity (Record rs2)
--     In the first argument of ‘(?~)’, namely ‘compositeKey’
--     In the second argument of ‘(&)’, namely
--       ‘compositeKey ?~ (Just compositeTxt)’
-- Compilation failed.

I'm starting to think the other alternative you mentioned (merging these columns in my producer) would be worlds easier since it's so difficult(impossible?) to construct something that satisfies CompositeKey ∈ rs2. I still don't quite understand why that doesn't work, but I was very much hoping CompositeKey ∈ rs2) meant "something named composite_key' that is of type Text". My feeling now is it does mean that, except you can't create a Record that happens to have a column named composite_key and happens to have a value of type Text and have it satisfy CompositeKey ∈ rs2).

Well, off to bed.

acowley commented 7 years ago

Okay, sorry for the confusion, I'm at a computer now. What you want is,

> rpure Nothing & compositeKey' ?~ Col "my voice is my passport" :: Rec Maybe (RecordColumns Row2)
{Just compositeKey :-> "my voice is my passport", Nothing}

(Row2 there is just a dummy table I set up with two columns, one of which is compositeKey.)

It's rpure Nothing to build the Rec Maybe rs2, and then the primed lens is needed because the unprimed one is fixed to the functor being Identity. A knock on from that is the need to explicitly use the Col newtype constructor because the unprimed lens doesn't bake that in, either.

The problem with your last attempt is that you have an argument, r, of type Record rs, but you then use it to create the return value which is supposed to have type, Rec Maybe rs2. So replace that first line of your definition with rpure Nothing & compositeKey ?~ Col compositeTxt and you should be cooking!

codygman commented 7 years ago

@acowley I think I'm closer, but that didn't quite work. This is rpure from import Data.Vinyl (RecApplicative(rpure)) right?

code/error

createCompositeKey :: (Col1 ∈ rs, Col2 ∈ rs, Col3 ∈ rs, CompositeKey ∈ rs2) => Record rs -> Rec Maybe rs2
createCompositeKey r = rpure Nothing & compositeKey ?~ Col compositeTxt
  where compositeTxt = (T.pack . show . view col1 $ r) <>
                       (T.pack . show . view col2 $ r) <>
                       (T.pack . show . view col3 $ r)
-- error
-- src/Frames/Test.hs:79:30: Couldn't match type ‘Data.Vinyl.Functor.Identity’ with ‘Maybe’ …
--     Expected type: Data.Vinyl.Functor.Identity x
--       Actual type: Maybe x
--     In the first argument of ‘rpure’, namely ‘Nothing’
--     In the first argument of ‘(&)’, namely ‘rpure Nothing’
-- src/Frames/Test.hs:79:40: Couldn't match type ‘Data.Vinyl.Functor.Identity’ with ‘Maybe’ …
--     Expected type: ASetter
--                      (Rec Data.Vinyl.Functor.Identity rs2)
--                      (Rec Maybe rs2)
--                      Text
--                      (Maybe (s0 :-> Text))
--       Actual type: (Text -> Identity Text)
--                    -> Record rs2 -> Identity (Record rs2)
--     In the first argument of ‘(?~)’, namely ‘compositeKey’
--     In the second argument of ‘(&)’, namely
--       ‘compositeKey ?~ Col compositeTxt’
-- Compilation failed.
acowley commented 7 years ago

That is the first issue I referred to when explaining why the primed lens, compositeKey', is needed. You can see in the error message that the "Actual type" talks about Record rather than Rec.

We generate two lenses for convenience (and, unfortunately, confusion): compositeKey has types nailed down to minimize syntactic clutter for the common case of working with the Identity functor and Col types. The template haskell also generates compositeKey' that is the raw vinyl lens into the record field. In the latter case, the field type could be anything (so we need to use the Col constructor) and the functor could be anything, which is what lets us use Just.

codygman commented 7 years ago

Ah, I was confused because compositeKey' want generated since I was working from the above and didn't define it.

I do see what you mean now.

On Dec 15, 2016 10:15 AM, "Anthony Cowley" notifications@github.com wrote:

That is the first issue I referred to when explaining why the primed lens, compositeKey', is needed. You can see in the error message that the "Actual type" talks about Record rather than Rec.

We generate two lenses for convenience (and, unfortunately, confusion): compositeKey has types nailed down to minimize syntactic clutter for the common case of working with the Identity functor and Col types. The template haskell also generates compositeKey' that is the raw vinyl lens into the record field. In the latter case, the field type could be anything (so we need to use the Col constructor) and the functor could be anything, which is what lets us use Just.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/73#issuecomment-267368933, or mute the thread https://github.com/notifications/unsubscribe-auth/AANyNz57ZkMhkc58HM4nKu3vWZtAKZRoks5rIWeSgaJpZM4LNU6N .

codygman commented 7 years ago

Now I have a CompositeKey ∈ rs2 => Rec Maybe rs2and tried:

f :: CompositeKey ∈ rs2 => Rec Maybe rs2 -> Record rs2
f = fromMaybe . recMaybe

However I ran into the same problem of not being able to satisfy the CompositeKey ∈ cs0 constraint. Do I need to sequence these and discard the nothing values somehow first? Perhaps I need to use the First functor for some reason?

codygman commented 7 years ago

I'll put my full code example here later, but to add detail for now I'm trying to use createCompositeKey in a pipe like denormalizedSource >-> P.take 5 >-> createCompositeKey >-> P.print.

acowley commented 7 years ago

Is cs0 the same as rs2? You need to have something concrete for that type variable at some point in your program. In typical Frames usage, the type synonym generated for the row type is a concrete type that shows up a lot and is used to pin down the types outside of library code that is polymorphic with respect to the full row type. In my example above, you see that I had a Row2 type. In that case, I created a CSV file with the columns I wanted and let Frames generate things because I didn't remember how to do it manually. If this destination row type of yours is entirely for output purposes, then you will need to define it in code.

If createCompositeKey is specifically intended to be polymorphic in the row type, then your debugging code will just need a little nudge to use your destination type. E.g. replacing P.print with P.print . id @Row2 (using TypeApplications).

Edit: I checked the docs, and it's declareColumn that would be used to define things manually.

codygman commented 7 years ago

cs0 and rs2 are not the same and I'm not sure why ghc is looking for cs0 rather than the rs2 constraint I provided.

I created a CSV file with the columns I wanted and let Frames generate things because I didn't remember how to do it manually

I don't have time to post full code or go into more detail now, but my intuition or pessimism tells me I need to know how to do the thing you forgot how to do :)

On Dec 15, 2016 12:45 PM, "Anthony Cowley" notifications@github.com wrote:

Is cs0 the same as rs2? You need to have something concrete for that type variable at some point in your program. In typical Frames usage, the type synonym generated for the row type is a concrete type that shows up a lot and is used to pin down the types outside of library code that is polymorphic with respect to the full row type. In my example above, you see that I had a Row2 type. In that case, I created a CSV file with the columns I wanted and let Frames generate things because I didn't remember how to do it manually. If this destination row type of yours is entirely for output purposes, then you will need to define it in code.

If createCompositeKey is specifically intended to be polymorphic in the row type, then your debugging code will just need a little nudge to use your destination type. E.g. replacing P.print with P.print . id @Row2 (using TypeApplications).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/acowley/Frames/issues/73#issuecomment-267407626, or mute the thread https://github.com/notifications/unsubscribe-auth/AANyNzGYpop1UHjAXZfPE3iTBfiyN5FKks5rIYrigaJpZM4LNU6N .

acowley commented 7 years ago

Doing it when you don't have a CSV to generate Haskell from goes like this,

declareColumn "compositeKey" ''Text
declareColumn "foobar" ''Bool
type Row2 = Record '[CompositeKey, Foobar]
test :: Rec Maybe (RecordColumns Row2)
test = rpure Nothing & compositeKey' ?~ Col "my voice is my passport"
codygman commented 7 years ago

So @acowley ... I finally accomplished what I was trying to do:

normalized.csv

composite_key,tag
"1234","one"
"5678","two"
"9101","three"
"1121","four"

denormalized.csv

key_a,key_b,key_c,key_d,tag
1,2,3,4,"one"
9,1,0,1,"three"
1,1,2,1,"four"

The goal being to first create a composite key column in denormalized.csv and then find out which row is missing in denormalized.csv.

Here is the full working code:

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
module Test where

import Frames hiding ((:&))
import Frames.CSV (tableTypesOverride, RowGen(..), rowGen, readTableOpt)
import Frames.InCore (RecVec)
import Data.Vinyl (Rec(RNil), RecApplicative(rpure))
import Data.Vinyl.TypeLevel (RIndex)
import Control.Lens (view, (&), (?~))
import Pipes (Producer, (>->), runEffect)
import Pipes.Internal (Proxy)
import Data.Monoid ((<>))
import Data.Maybe (fromMaybe, isNothing)
import qualified Control.Foldl as L
import qualified Pipes.Prelude as P
import qualified Data.Text as T
import qualified Data.Text.Lazy as LT
import qualified Data.Map as M
import qualified Data.Text.Format as T

tableTypesOverride rowGen {rowTypeName ="Normalized"} "normalized.csv" [("composite_key", ''Text)]
tableTypesOverride rowGen {rowTypeName ="Denormalized"} "denormalized.csv" []

mkCompositeKey :: (RecApplicative rs2, KeyA ∈ rs, KeyB ∈ rs, KeyC ∈ rs, KeyD ∈ rs, CompositeKey ∈ rs2) =>
                  Record rs -> Record rs2
mkCompositeKey denormRow = fromMaybe (error "failed to make composite key") . recMaybe $ newRow
  where newRow = rpure Nothing & compositeKey' ?~ Col compositeKeyTxt
        compositeKeyTxt = view' keyA <> view' keyB <> view' keyC <> view' keyD
        view' l = LT.toStrict $ T.format "{}" (T.Only (view l denormRow))
        intToTxt = T.pack . show :: Int -> Text -- slow!

normalized :: Producer Normalized IO ()
normalized = readTableOpt normalizedParser "normalized.csv"
-- λ> runEffect $ normalized >-> P.take 5 >-> P.print
-- {composite_key :-> "1234", tag :-> "one"}
-- {composite_key :-> "5678", tag :-> "two"}
-- {composite_key :-> "9101", tag :-> "three"}

denormalized :: Producer Denormalized IO ()
denormalized = readTableOpt denormalizedParser "denormalized.csv"
-- λ> runEffect $ denormalized >-> P.take 5 >-> P.print
-- {key_a :-> 1, key_b :-> 2, key_c :-> 3, key_d :-> 4, tag :-> "one"}
-- {key_a :-> 5, key_b :-> 6, key_c :-> 7, key_d :-> 8, tag :-> "two"}
-- {key_a :-> 9, key_b :-> 1, key_c :-> 0, key_d :-> 1, tag :-> "three"}

-- denormalized' is actually the "normalized" denormalized in that it contains the same compositeKey. I was lazy and didn't care about keeping the tag.
denormalized' :: Producer (Record '[CompositeKey]) IO ()
denormalized' = denormalized >-> P.map mkCompositeKey
-- λ> runEffect $ denormalized' >-> P.take 5 >-> P.print
-- {composite_key :-> "1234"}
-- {composite_key :-> "5678"}
-- {composite_key :-> "9101"}

addCompositeKeyToMapFold :: (Num a, CompositeKey ∈ rs) =>
                            L.Fold (Record rs) (M.Map Text a)
addCompositeKeyToMapFold = L.Fold (\m r -> addCompositeKeyToMap m r) (M.empty) id

addCompositeKeyToMap
  :: (Num a, CompositeKey ∈ rs) =>
     M.Map Text a -> Record rs -> M.Map Text a
addCompositeKeyToMap m r = M.insert (view compositeKey r) 0 m

buildCompositeKeyMap :: (Foldable f, CompositeKey ∈ rs) =>
                        f (Record rs) -> M.Map Text Integer
buildCompositeKeyMap = L.fold addCompositeKeyToMapFold

findMissingRows :: forall
                   (m :: * -> *)
                   (rs :: [*])
                   a'
                   a
                   (m1 :: * -> *)
                   r
                   (rs1 :: [*]).
                   ( Monad m1,
                     CompositeKey ∈ rs,
                     CompositeKey ∈ rs1,
                     L.PrimMonad m,
                     Frames.InCore.RecVec rs
                   ) =>
                   Pipes.Internal.Proxy a' a () (Record rs1) m1 r
                -> Producer (Record rs) m ()
                -> m (Pipes.Internal.Proxy a' a () (Record rs1) m1 r)
findMissingRows referenceProducer checkProducer = do
  -- build the index of rows in the producer to check
  compositeKeyMap <- buildCompositeKeyMap <$> inCoreAoS checkProducer
  -- we could have built compositeKeyMap in a single line if we were golfing
  -- L.fold (L.Fold (\m r -> M.insert (view compositeKey r) 0 m) (M.empty) id) <$> inCoreAoS checkProducer

  -- keep only rows we didn't have in the checkProducer index produced
  return $ referenceProducer >-> P.filter (\r -> isNothing $ M.lookup (view compositeKey r) compositeKeyMap)

printMissingRows :: IO ()
printMissingRows = do
  putStrLn "rows normalized contains deonrmalized does not"
  findMissingRows normalized denormalized' >>= \p -> runEffect $ p >-> P.print

I have a few questions:

acowley commented 7 years ago

First and foremost: congratulations! 🎉 😀

That type signature is certainly an eyesore, it looks like something GHCi would say. It's late here, so this might not be quite right, but I'll give it a quick shot.

Oh, I also don't think you benefit from inCoreAoS here as that is primarily for situations where you need to perform multiple passes over the data. This may require a change to the fold things that build the Map, but Pipes.Prelude.fold might do the job.

In that case, I'd expect it to go something like this,

findMissingRows :: (Monad m, CompositeKey ∈ rs, CompositeKey ∈ rs1) 
                => Producer (Record rs) m ()
                -> m (Pipe (Record rs1) (Record rs1) m ())
findMissingRows checkProducer = do
  -- build the index of rows in the producer to check
  compositeKeyMap <- buildCompositeKeyMap checkProducer
  -- keep only rows we didn't have in the checkProducer index produced
  return $ P.filter (\r -> M.notMember (view compositeKey r) compositeKeyMap)

Then you do the plumbing elsewhere,

printMissingRows :: IO ()
printMissingRows = do
  putStrLn "rows normalized contains deonrmalized does not"
  findMissing <- findMissingRows denormalized
  runEffect $ normalized >-> findMissing >-> P.print

In general you usually want to make an effort to separate the gluing together of pipes/conduits/machines from the definitions of the pieces. This has the nice effect of making it relatively straightforward to read off the data flow of the program by looking at the top-level definition.

codygman commented 7 years ago

Thanks! This one was (and is) a little tough ;)

Looks like I wasn't quite done...

If I change the producers to use the defaulting ones:

normalizedDefaulted :: Producer Normalized IO ()
normalizedDefaulted = readTableMaybe "normalized.csv" >-> P.map (fromJust . holeFiller)
  where holeFiller :: Rec Maybe (RecordColumns Normalized) -> Maybe Normalized
        holeFiller = recMaybe
                   . rmap getFirst
                   . rapply (rmap (Lift . flip (<>)) def)
                   . rmap First
        fromJust = fromMaybe (error "normalizedDefaulted failure")

denormalizedDefaulted :: Producer Denormalized IO ()
denormalizedDefaulted = readTableMaybe "denormalized.csv" >-> P.map (fromJust . holeFiller)
  where holeFiller :: Rec Maybe (RecordColumns Denormalized) -> Maybe Denormalized
        holeFiller = recMaybe
                   . rmap getFirst
                   . rapply (rmap (Lift . flip (<>)) def)
                   . rmap First
        fromJust = fromMaybe (error "denormalizedDefaulted failure")

-- denormalized' is actually the "normalized" denormalized in that it contains the same compositeKey. I was lazy and didn't care about keeping the tag.
denormalizedDefaulted' :: Producer (Record '[CompositeKey]) IO ()
denormalizedDefaulted' = denormalizedDefaulted >-> P.map mkCompositeKey

I get the following type constraint error:

src/Frames/Test.hs:145:3: error: …
    • No instance for (RElem
                         ("composite_key" :-> Text)
                         '[]
                         (RIndex ("composite_key" :-> Text) '[]))
        arising from a use of ‘findMissingRows’
    • In the first argument of ‘(>>=)’, namely
        ‘findMissingRows normalizedDefaulted denormalizedDefaulted’
      In a stmt of a 'do' block:
        findMissingRows normalizedDefaulted denormalizedDefaulted
        >>= \ p -> runEffect $ p >-> P.print
      In the expression:
        do { putStrLn "rows normalized contains deonrmalized does not";
             findMissingRows normalizedDefaulted denormalizedDefaulted
             >>= \ p -> runEffect $ p >-> P.print }
Compilation failed.

I'll try taking advantage of pipelines more and see if that fixes the inference problem as well.

codygman commented 7 years ago

It seems that changing the first line of denormalizedDefaulted as well as the type signature fixed my issues:

denormalizedDefaulted :: Producer (Record '[CompositeKey]) IO ()
denormalizedDefaulted = readTableMaybe "denormalized.csv" >-> P.map (fromJust . holeFiller) >-> P.map mkCompositeKey

I loosely understand why that worked I think.

acowley commented 7 years ago

Yes! That's the issue we talked about earlier. Feeding something with polymorphic output into P.print is no good because P.print won't fix the types involved. Luckily, all that is needed is to provide a type that satisfies CompositeKey ∈ rs which '[CompositeKey] certainly does.

codygman commented 7 years ago

Funnily enough, my actual problem was that when I translated this example over to my use case I was mixing up the normalize and denormalized sources.

My error came from calling buildCompositeKey on Normalized which doesn't contain Key1, 2, or, 3.

I was so sure that this was a type inference error I overlooked the possibility that frames was doing what it was meant to ;)

acowley commented 7 years ago

A big part of that is the error messages being off-putting. Once you start breaking it down, they're not actually that bad. The problem is that, for me at least, there is so much text that I don't want to read it. I think Idris-mode does a good job in this area with collapsible error messages.

codygman commented 7 years ago

Now I'm trying to update the above to append the CompositeKey column rather than create a record of only CompositeKey. I've been using this issue for reference, but I'm stuck with this:

f :: ( RecApplicative rs2
     , KeyA ∈ rs
     , KeyB ∈ rs
     , KeyC ∈ rs
     , KeyD ∈ rs
     , CompositeKey ∈ rs2
                  ) => Record rs -> Record (CompositeKey ':  RecordColumns Denormalized)
f inRecord = frameConsA compositeKey inRecord
  where compositeKeyTxt :: Text
        compositeKeyTxt = view compositeKey (mkCompositeKey inRecord)
-- type error:
-- src/Main.hs:129:14: error: …
--     • Couldn't match type ‘(Text -> f0 Text)
--                            -> Record rs1 -> f0 (Record rs1)’
--                      with ‘Text’
--       Expected type: Record (CompositeKey : RecordColumns Denormalized)
--         Actual type: Rec
--                        Identity
--                        ("composite_key"
--                         :-> ((Text -> f0 Text) -> Record rs1 -> f0 (Record rs1))
--                           : rs)
--     • In the expression: frameConsA compositeKey inRecord
--       In an equation for ‘f’:
--           f inRecord
--             = frameConsA compositeKey inRecord
--             where
--                 compositeKeyTxt :: Text
--                 compositeKeyTxt = view compositeKey (mkCompositeKey inRecord)
-- Compilation failed.
acowley commented 7 years ago

Shouldn't that be compositeKeyTxt in the frameConsA application?

codygman commented 7 years ago

@acowley Oh dear, I think you are right :D

codygman commented 7 years ago

I still have an error despite fixing that particular issue:

f :: ( RecApplicative rs2
     , KeyA ∈ rs
     , KeyB ∈ rs
     , KeyC ∈ rs
     , KeyD ∈ rs
     , CompositeKey ∈ rs2
     ) => Record rs -> Record (CompositeKey ':  RecordColumns Denormalized)
f inRecord = frameConsA ("compositeKeysText" :: Text) inRecord
  -- type error:
-- src/Main.hs:129:14: error: …
--     • Couldn't match type ‘rs’
--                      with ‘'["key_a" :-> Int, "key_b" :-> Int, "key_c" :-> Int,
--                              "key_d" :-> Int, "tag" :-> Text, "bdouble" :-> Double,
--                              "bbool" :-> Bool]’
--       ‘rs’ is a rigid type variable bound by
--         the type signature for:
--           f :: forall (rs2 :: [*]) (rs :: [*]).
--                (RecApplicative rs2, KeyA ∈ rs, KeyB ∈ rs, KeyC ∈ rs, KeyD ∈ rs,
--                 CompositeKey ∈ rs2) =>
--                Record rs -> Record (CompositeKey : RecordColumns Denormalized)
--         at /home/cody/sources/frames-differences-current/src/Main.hs:122:6
--       Expected type: Record (CompositeKey : RecordColumns Denormalized)
--         Actual type: Rec Identity ("composite_key" :-> Text : rs)
--     • In the expression:
--         frameConsA ("compositeKeysText" :: Text) inRecord
--       In an equation for ‘f’:
--           f inRecord = frameConsA ("compositeKeysText" :: Text) inRecord
--     • Relevant bindings include
--         inRecord :: Record rs
--           (bound at /home/cody/sources/frames-differences-current/src/Main.hs:129:3)
--         f :: Record rs
--              -> Record (CompositeKey : RecordColumns Denormalized)
--           (bound at /home/cody/sources/frames-differences-current/src/Main.hs:129:1)
-- Compilation failed.

Isn't the below equal?

(KeyA ∈ rs , KeyB ∈ rs , KeyC ∈ rs , KeyD ∈ rs) => rs == Record (CompositeKey : RecordColumns Denormalized)

I relaxed the constraint a little to just require containing CompositeKey and got the following:

f :: ( RecApplicative rs2
     , KeyA ∈ rs
     , KeyB ∈ rs
     , KeyC ∈ rs
     , KeyD ∈ rs
     , CompositeKey ∈ rs2
     ) => Record rs -> Record rs2
f inRecord = frameConsA ("compositeKeysText" :: Text) inRecord
-- src/Main.hs:129:14: error: …
--     • Couldn't match type ‘rs2’ with ‘s0 :-> Text : rs’
--       ‘rs2’ is a rigid type variable bound by
--         the type signature for:
--           f :: forall (rs2 :: [*]) (rs :: [*]).
--                (RecApplicative rs2, KeyA ∈ rs, KeyB ∈ rs, KeyC ∈ rs, KeyD ∈ rs,
--                 CompositeKey ∈ rs2) =>
--                Record rs -> Record rs2
--         at /home/cody/sources/frames-differences-current/src/Main.hs:122:6
--       Expected type: Record rs2
--         Actual type: Rec Identity (s0 :-> Text : rs)
--     • In the expression:
--         frameConsA ("compositeKeysText" :: Text) inRecord
--       In an equation for ‘f’:
--           f inRecord = frameConsA ("compositeKeysText" :: Text) inRecord
--     • Relevant bindings include
--         inRecord :: Record rs
--           (bound at /home/cody/sources/frames-differences-current/src/Main.hs:129:3)
--         f :: Record rs -> Record rs2
--           (bound at /home/cody/sources/frames-differences-current/src/Main.hs:129:1)
-- Compilation failed.

In this example I'm also thinking that rs2 and (s0 :-> Text : rs) but according to the error, it looks like this is untrue.

acowley commented 7 years ago

You can decode the problem from the error messages.

f :: ( RecApplicative rs2
     , KeyA ∈ rs
     , KeyB ∈ rs
     , KeyC ∈ rs
     , KeyD ∈ rs
     , CompositeKey ∈ rs2
     ) => Record rs -> Record (CompositeKey ':  RecordColumns Denormalized)

This type says that you will take any list of type rs that satisfies that constraints and return a specific list CompositeKey ': RecordColumns Denormalized, but you are building the result by cons'ing the CompositeKey column onto rs, which means that you are assuming rs ~ RecordColumns Denormalized which does not follow from the type.

You can either return Record (CompositeKey ': rs) or change the argument to Denormalized.


In your second attempt you are assuming rs2 ~ (CompositeKey ': rs), which again does not follow from the type signature.

Note that you are /adding/ a column to something, so it is unlikely that the CompositeKey ∈ _ constraint is ever needed.

In earlier conversations we talked about updating a record that had a CompositeKey column. In that case, you would need that constraint because you are replacing an existing column. In this more recent effort you are using frameConsA to add a column to a record, so there is no CompositeKey constraint that the caller must satisfy for the function to do its work.

codygman commented 7 years ago

Wow finally, Full working version. I got some ideas on how to make it more generic from Applying Type-Level and Generic Programming in Haskell by Andres Löh as well as a better understanding as to how vinyl and frames are implemented.

Thanks for your patience and guidance. Hopefully in return I can clean up this example and add it to the demos directory and perhaps add some other tutorials and/or documentation :)

codygman commented 7 years ago

So now I'm trying to make a generic findMissingRowsOn. I almost finished but then ran the issue below. I think I need to add a MonadReader _ Identity constraint on rec2, but I'm not sure what _ should be in that case.

Is there a way to find out the default instance of MonadReader that view uses from ghci when I'm using it in a more decidable case?

findMissingRowsOn :: ( Monad monad
                     , Ord key
                     ) =>
                     Getting key (Rec Identity rec1) key -- lens
                  -> Producer (Record rec1) monad ()     -- checkProducer
                  -> monad (Pipe (Record rec2) (Record rec2) monad ())
findMissingRowsOn lens checkProducer = do
  keyMap <- P.fold (\m r -> M.insert (view lens r) 0 m) M.empty id checkProducer
  pure $ P.filter (\(r :: Record rec2) -> M.notMember (view lens r) keyMap)
-- type error:
--  src/Main.hs:148:56: error: …
--     • Could not deduce (MonadReader
--                           (Rec Identity rec1) ((->) (Record rec2)))
--         arising from a use of ‘view’
--       from the context: (Monad monad, Ord key)
--         bound by the type signature for:
--                    findMissingRowsOn :: (Monad monad, Ord key) =>
--                                         Getting key (Rec Identity rec1) key
--                                         -> Producer (Record rec1) monad ()
--                                         -> monad (Pipe (Record rec2) (Record rec2) monad ())
--         at /home/cody/source/frames-differences/src/Main.hs:(140,1)-(145,70)
--     • In the first argument of ‘M.notMember’, namely ‘(view lens r)’
--       In the expression: M.notMember (view lens r) keyMap
--       In the first argument of ‘P.filter’, namely
--         ‘(\ (r :: Record rec2) -> M.notMember (view lens r) keyMap)’
-- Compilation failed.
codygman commented 7 years ago

So the above has to do with the lens being used for rec1 and then attempting to use it on a rec2 when the lens has already specialized on a rec1. I'm not sure how to let the type system know that the lens can work on either rec2 or rec1.

I verified this by making a function that takes two lenses and passing in the exact same lens and seeing it typecheck.

acowley commented 7 years ago

You can look at the lenses the Frames TH generates to see what the types look like without any lens type synonyms. Passing something with that sort of type, but perhaps using a synonym to make it cleaner, should work.