Closed xflr6 closed 6 years ago
https://www.w3.org/TR/tabular-data-model/#parsing
Otherwise, if the string starts with the escape character and the escape character is not the same as the quote character, append the character following the escape character to the current cell value and move on to process the string following that character.
But CSVW does allow for a quote character of null
. Wouldn't this translate to quoting=csv.QUOTE_NONE
?
Apart from this, I tend to agree with your proposed change; but I vaguely remember changing the behaviour based on a particular real-life case I encountered.
Even so, I'd say we change the code to be more in line with the spec (and python's docunented behaviour), and wait for any problems to re-surface.
Currently, we set Python
escapechar
to\\
whenever CSVWquoteChar
is set:https://github.com/cldf/csvw/blob/f0869818eb8c34949809f1b24a7e1d30cd0adb51/csvw/dsv_dialects.py#L112
However, the spec says:
Python
csv
is in line with this (apart from what variable controls what) :IIUC,
QUOTE_NONE
is not allowed by CSVW at all and Pythonescapechar
should not be set withdoublequote=True
(i.e. Pythondoublequote=True, escapechar=None
corresponds to CSVWdoubleQuote=True
, which setsescape character
to"
).In other words, the line above would need to be changed to:
In this case the following test cases would be ill-formed:
https://github.com/cldf/csvw/blob/f0869818eb8c34949809f1b24a7e1d30cd0adb51/tests/test_metadata.py#L57
https://github.com/cldf/csvw/blob/f0869818eb8c34949809f1b24a7e1d30cd0adb51/tests/test_dsv.py#L147
Side note: AFAIU, neither Python nor CSVW escape the delimiter, so even with
doubleQuote=False
, the writer would not producey\\,x
(escaped delimiter) but rather"y,x"
(quoted field) for cells containing the delimiter, although the former is parsed correctly by Pythoncsv
: