PyFilesystem / pyfilesystem

Python filesystem abstraction layer
http://pyfilesystem.org/
BSD 3-Clause "New" or "Revised" License
288 stars 63 forks source link

Support opening files in universal-newline mode inside ZipFS #174

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. take a zip file containing a CSV file with arbitrary newlines in it
2. fs = ZipFS('that.zip', 'r')
3. f = fs.open('fileinside.csv', 'rU')
4. list(csv.reader(f))

What is the expected output? What do you see instead?

I expect to be able to read the file. Instead, this fails if the CSV file 
contains lines ending in '\r', because the 'U' flag in the mode is ignored and 
a hardcoded "r" is passed to ZipFile.open.

Issue #160 discussion concluded that “leaving out the deprecated U option 
does make sense” ... that would be fair enough, except ZipFS.open also 
doesn't support the new API replacing that deprecated one: passing newline=None 
is ignored (and actually ZipFile.open doesn't support it anyway).

In Python 2.7 there seem to be only two documented ways to read a file in 
universal-newline mode:

(a) pass either mode='rU' or newline=None to io.open (ZipFile.open doesn't 
support the latter; it does support the former, but ZipFS blocks use of it)

(b) or use TextIOWrapper. But that unconditionally encodes to unicode too ... 
and the csv module in Python 2.7 does not support unicode. So to use 
TextIOWrapper with csv one would have to do something terrible:

stream = fs.open('fileinside.csv', 'r')

# decode universal newlines, which unavoidably also decodes to unicode
universal_newlines_stream = TextIOWrapper(stream, encoding=some_encoding)

# encode back to bytes because csv requires this
encoded_byte_stream = (line.encode(some_encoding) for line in 
universal_newlines_stream)

# now parse the csv
csv_rows = csv.reader(encoded_byte_stream, some_csv_dialect)

# finally decode unicode *again*, because that's actually what we need!
unicode_csv_rows = (
    tuple(col.decode(some_encoding) for col in row)
    for row in csv_rows
)

Aaagh! : )

Rather than have us go through these contortions, ZipFS.open should just pass 
mode to ZipFile.open in the case where mode contains 'r' (ZipFile.open does not 
support the newline parameter).

What version of the product are you using? On what operating system?
This problem is present in both version 0.4.0 and at the SVN HEAD.

Original issue reported on code.google.com by gunnlau...@gmail.com on 3 Mar 2014 at 11:22

GoogleCodeExporter commented 9 years ago
Patch against HEAD, to pass mode to ZipFile.open()

Original comment by gunnlau...@gmail.com on 3 Mar 2014 at 11:30

Attachments:

GoogleCodeExporter commented 9 years ago
Oh. Actually some more care is needed, because as of r854 ZipFS.open automatic 
wraps with a TextIOWrapper if we don't pass 'b' in the mode ... but 
ZipFile.open raises RuntimeError if we *do* pass 'b' in the mode. : ) So, new 
patch.

Original comment by gunnlau...@gmail.com on 3 Mar 2014 at 11:55

Attachments:

GoogleCodeExporter commented 9 years ago
Let's try that patch upload again.

Original comment by gunnlau...@gmail.com on 3 Mar 2014 at 11:56

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks. Applied your patch. It's in trunk now.

Original comment by willmcgugan on 13 Mar 2014 at 6:55