acowley / Frames

Data frames for tabular data.
Other
297 stars 41 forks source link

Parsing numerical values which use comma rather than a point #175

Open idontgetoutmuch opened 1 year ago

idontgetoutmuch commented 1 year ago

Some countries use 3,1459 rather than 3.1459. I have for example

"LATITUDE","LONGITUDE"
"-21,5245377777","-39,6610013888"
"-9,5933511111","-35,8891097222"

But if I run

{-# LANGUAGE DataKinds, FlexibleContexts, QuasiQuotes, TemplateHaskell, TypeApplications #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import           Frames
import qualified Pipes.Prelude as P

tableTypes "Row" "mwe.csv"

tbl2a :: IO [ColFun Maybe Row]
tbl2a = runSafeT . P.toListM $ readTableMaybe "mwe.csv"

main :: IO ()
main = tbl2a >>= mapM_ (putStrLn . show)

Then Frames seems to parse these values as doubles:

main
{Just LATITUDE :-> -2.15245377777e11, Just LONGITUDE :-> -3.96610013888e11}
{Just LATITUDE :-> -9.5933511111e10, Just LONGITUDE :-> -3.58891097222e11}
idontgetoutmuch commented 1 year ago
row2 :: Rec (Either Text :. ElField) '[LATITUDE, LONGITUDE]
row2 = readRec ["-21,5245377777","-39,6610013888"]

row2
{Right LATITUDE :-> -2.15245377777e11, Right LONGITUDE :-> -3.96610013888e11}

but

Prelude> read "3,1459" :: Double
*** Exception: Prelude.read: no parse
idontgetoutmuch commented 1 year ago
instance Parseable Double where
  -- Some CSV's export Doubles in a format like '1,000.00', filtering
  -- out commas lets us parse those sucessfully
  parse = fmap Definitely . fromText . T.filter (/= ',')
idontgetoutmuch commented 1 year ago

With

parse = fmap Definitely . fromText

I get

main
{LATITUDE :-> "-21,5245377777", LONGITUDE :-> "-39,6610013888"}
{LATITUDE :-> "-9,5933511111", LONGITUDE :-> "-35,8891097222"}

@acowley I am not sure what the answer is here. Presumably somebody wanted to parse "1,000.00" as a 1000.00 so we can't just delete the offending filter.

idontgetoutmuch commented 1 year ago

Actually now that I think about it, Frames should not interpret "1,000.00". If the number is not in the basic format "1000.00" then the user will get text and can parse it themself just like I will parse the text "3,1459" myself.

idontgetoutmuch commented 1 year ago

There's this https://en.m.wikipedia.org/wiki/Decimal_separator but then someone would have to do something other than just filter out commas.

idontgetoutmuch commented 1 year ago

See https://github.com/acowley/Frames/issues/53