acowley / Frames

Data frames for tabular data.
Other
297 stars 41 forks source link

colQ disappeared ? #152

Open teto opened 4 years ago

teto commented 4 years ago

The tutorial at http://acowley.github.io/Frames/#org8c2a1c7 suggests the usage of colQ via import Frames.CSV (colQ). Doing so results in an error and the colQ command seems to have been commented out on master.

Any workaround ? I want to add some columns to the universe: , columnUniverse = $(colQ ''MyColumns)

Cheers

teto commented 3 years ago

yes it was removed 2df3add7dd8d279cb892ae5e7ce0786bf93fdaf2

idontgetoutmuch commented 1 year ago

But what should I used instead? I currently have

-- | A 'UTCTime' tagged with a symbol denoting the 'TZ' time zone from
-- whence it came.
newtype TimeIn (zone :: Symbol) = TimeIn UTCTime deriving Show

-- | Try to parse a 'LocalTime' value using common formats.
parseLocalTime :: (MonadFail m, MonadPlus m) => T.Text -> m LocalTime
parseLocalTime t = msum (map (($ T.unpack t) . mkParser) formats)
  where formats = ["%F %T", "%F"]
        mkParser = parseTimeM True defaultTimeLocale

tzChicago :: TZ
tzChicago = $(includeTZFromDB "America/Chicago")

instance Parseable (TimeIn "America/Chicago") where
  parse = fmap (Definitely . TimeIn . (localTimeToUTCTZ tzChicago)) . parseLocalTime

-- | We need this newtype because Template Haskell can not handle the
-- type @TimeIn "America/Chicago"@ as of @GHC-8.0.1@ and
-- @template-haskell-2.11.0.0@
newtype Chicago = Chicago (TimeIn "America/Chicago") deriving Show

instance Parseable Chicago where
  parse = fmap (fmap Chicago) . parse

-- | The column types we expect our data to conform to
type MyColumns = Chicago ': CommonColumns

but this does not work

tableTypes' ((rowGen "/Users/dom/Frames/demo/TimeZones/users.csv") { rowTypeName = "Whatever"
                                                                   , columnUniverse = Proxy :: Proxy MyColumns
                                                                   })

which ignores the custom parser

GHCJ > :i Whatever
:i Whatever
type Whatever :: *
type Whatever = Record '[SignupDate, Id]
    -- Defined at demo/TimeZones/src/Main.hs:29:1
GHCJ > :i SignupDate
:i SignupDate
type SignupDate :: (GHC.Types.Symbol, *)
type SignupDate = "signup_date" :-> Text :: (GHC.Types.Symbol, *)
    -- Defined at demo/TimeZones/src/Main.hs:29:1

But

type Urk = "signup_date" :-> Chicago

type Wat = Record '[Urk, Id]

does work

loadUsers :: MonadSafe m => Producer Wat m ()
loadUsers = readTable "/Users/dom/Frames/demo/TimeZones/users.csv"

main :: IO ()
main = runSafeEffect $ loadUsers >-> P.print

gives

GHCJ > main
main
{signup_date :-> Chicago (TimeIn 2016-08-04 05:00:00 UTC), id :-> 0}
{signup_date :-> Chicago (TimeIn 2012-03-02 06:00:00 UTC), id :-> 1}
{signup_date :-> Chicago (TimeIn 2006-10-18 06:00:00 UTC), id :-> 2}
acowley commented 1 year ago

Thanks for letting me know! The way the demos have fallen out of date is a real shame. As you say, SignupDate has a Text payload, even though,

ghci> parse "2006-10-18 01:00:00" :: Maybe (Parsed Chicago)
Just (Definitely (Chicago (TimeIn 2006-10-18 06:00:00 UTC)))

I'm not yet sure why the Chicago type's parser isn't tried.

acowley commented 1 year ago

Looking at things, I feel like the right thing to do is this,

orderParsePriorities :: Parsed (Maybe Type) -> Maybe Int
orderParsePriorities x =
  case discardConfidence x of
    Nothing -> Just (1 + 6) -- categorical variable
    Just t
      | t == tyText -> Just (0 + uncertainty)
      | t == tyDbl -> Just (2 + uncertainty)
      | t == tyInt -> Just (3 + uncertainty)
      | t == tyBool -> Just (4 + uncertainty)
      | otherwise -> Just (5 + uncertainty) -- Unknown type
  where tyText = ConT (mkName "Text")
        tyDbl = ConT (mkName "Double")
        tyInt = ConT (mkName "Int")
        tyBool = ConT (mkName "Bool")
        uncertainty = case x of Definitely _ -> 6; Possibly _ -> 0

And I think that does make the time zone parser work properly. But it also changes the way we handle missing data causing some of the unit tests to fail. I don't remember all the thinking about this. The problem is when you have something like,

name,id
Joe,23
Sue,41
Adam,
Shirley,19

How should the row with the name "Adam" affect inference of the id column? The way it was, that missing data wouldn't affect things, while with my initial fixing here the missing data causes is to infer the id column has needing Text. I'll try to preserve the old behavior.