agentm / project-m36

Project: M36 Relational Algebra Engine
The Unlicense
876 stars 47 forks source link

parse error in importcsv #303

Closed YuMingLiao closed 2 years ago

YuMingLiao commented 2 years ago

test.csv

attr
(

It's fine.

test.csv

attr
)
TutorialD (master/main): n :: {attr Text}
TutorialD (master/main): :importcsv "test.csv" n
ERR: ParseError "AttributeMappingError (ParseError \"endOfInput\")"
agentm commented 2 years ago

The CSV parser in Project:M36 only parses quoted Text fields, so set your CSV exporter to quote all text fields. Project:M36 exports text fields with quotes unconditionally so that it can round-trip the data. Sorry if that's not clear.

CSV is not a technical standard, so without this requirement, certain strings become ambiguous:

In addition, Project:M36 generates CSV files with Haskell ADTs like HairColor "Blond", so the text "Blond" is a TextAtom within an algebraic data type in the CSV file.

This could be improved, but no one solution would be able to parse all CSV files consistently, so I punted on it entirely and required TextAtoms to be quoted unconditionally to be unambiguous. I have added a note to the documentation to clarify this.

YuMingLiao commented 2 years ago

I can pass TextAtom without quotes, actually. It seems something wrong with right paren, only.

"city" "("

This one is ok.

"city" ")"

This one is not.

TutorialD (master/main): x :: {city Text} TutorialD (master/main): :importcsv "nutrition/one_column.csv" x ERR: ParseError "AttributeMappingError (ParseError \"endOfInput\")"

YuMingLiao commented 2 years ago

parseAtom attrName aType textIn = case APT.parseOnly (parseCSVAtomP attrName tConsMap aType <* APT.endOfInput) textIn of Left err -> Left (ParseError (T.pack err))

I guess it's because a right paren is treated like an endOfInput in a TextAtom parsing.

YuMingLiao commented 2 years ago

--read data for Text.Read parser but be wary of end of interval blocks takeToEndOfData :: APT.Parser T.Text takeToEndOfData = APT.takeWhile (APT.notInClass ",)]") I see. the right paren is treated as end of interval blocks. So I can't have TextAtom with a ) character.

agentm commented 2 years ago

Yea, the current behavior is arbitrary and unintentional. I'll fix the parser to error out on unquoted strings.