cldf / csvw

CSV on the web
Apache License 2.0
37 stars 6 forks source link

support py3.10 #59

Closed xrotwang closed 2 years ago

xrotwang commented 2 years ago

So this PR simply accepts that we can't guarantee consistent roundtrip behaviour across all supported Python versions.

closes #58

codecov-commenter commented 2 years ago

Codecov Report

Merging #59 (3eb452f) into master (45584ad) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #59   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         2534      2540    +6     
=========================================
+ Hits          2534      2540    +6     
Impacted Files Coverage Δ
tests/test_csv_escapechar.py 100.00% <100.00%> (ø)
tests/test_dsv.py 100.00% <100.00%> (ø)
tests/test_metadata.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 45584ad...3eb452f. Read the comment docs.

xrotwang commented 2 years ago

I really find it difficult to wrap my head around this. Fortunately, our default settings presumably save us from this problem most of the times:

>>> from csvw.dsv import Dialect
>>> Dialect().escape_character
'"'

Anyway, Python 3.10 seems to be "correct" in the sense that writing and reading CSV data with escapechar='\\' passed to both csv.writer and csv.reader does roundtrip data containing \\. But that seems semantically wrong to me. I'd say that by definition you cannot use the same setting for escapechar for writing and reading. Once data is written, it should be platform agnostic. Basically, escapechar shouldn't have any impact upon reading at all.

xrotwang commented 2 years ago

Just to exemplify what I mean above: I ran the following piece of code:

import sys
import csv

row = ['spam', '\\eggs']

print(sys.version)

with open('eggs.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, escapechar='\\')
    spamwriter.writerow(row)

with open('eggs.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, escapechar='\\')
    for r in spamreader:
        assert r == row, '{} vs {}'.format(r, row)
        break

with py3.9:

3.9.10 (main, Jan 15 2022, 18:17:56) 
[GCC 9.3.0]
Traceback (most recent call last):
  File "/home/robert_forkel/test2.py", line 16, in <module>
    assert r == row, '{} vs {}'.format(r, row)
AssertionError: ['spam', 'eggs'] vs ['spam', '\\eggs']

and py3.10:

3.10.2 (main, Jan 15 2022, 18:02:07) [GCC 9.3.0]

Naively, I'd say py3.10 is "more" correct. But upon reflection, I'd say py3.9 is, because it does the right thing when not specifying escapechar for reading.