acowley / Frames

Data frames for tabular data.
Other
297 stars 41 forks source link

Custom separator does not seem to be working #178

Open apraga opened 11 months ago

apraga commented 11 months ago

Hi,

When trying the tutorial, using a custom separator does not seem to work :

With ml-100k/u.user containing

1;24;M;technician;85711
2;53;F;other;94043

The following code outputs nothing

module Main where

import qualified Control.Foldl                 as L
import qualified Data.Foldable as F
import           Frames
import Frames.TH (rowGen, RowGen(..))

tableTypes' (rowGen "ml-100k/u.user")
            { rowTypeName = "U2"
            , separator = ";"}

loadRows :: IO (Frame U2)
loadRows = inCoreAoS (readTable "ml-100k/u.user")

main = do
  ms <- loadRows
  mapM_ print (F.toList  ms)

Replacing ";" in the data file and in the code recognize the columns. Am I doing something wrong ? Using GHC 9.2.8 and Frames 0.7.3. Thanks,

acowley commented 11 months ago

This is rather opaque. Since we need a special parser for this table and its associated row type, we must make use of a definition tableTypes' quietly provides for us, u2Parser :: ParserOptions, that you must then pass to readTableOpt. So, you write loadRows = inCoreAoS (readTableOpt u2Parser "ml-100k/u.user"). What's happening is that custom separator is being used for parsing the file to generate the types at compile time, but then readTable is using the default parser at runtime.

acowley commented 11 months ago

For this data, it may also be helpful to provide column names. I don't think these are accurate, but it's what I threw together when testing this:

tableTypes' (rowGen "test/data/ml-100k/u.user")
            { rowTypeName = "U2"
            , columnNames = ["index", "age", "sex", "title", "zip"]
            , separator = ";"}
apraga commented 11 months ago

Thanks for the answer, it works perfectly. Do you think the documentation could be updated to reflect that ?

acowley commented 11 months ago

Yes, definitely. I was looking at how hls can expand splices in place, and it does show the various definitions, but it’s no substitute for documentation.