Closed cschloer closed 3 years ago
Hi @cschloer,
I can't reproduce it:
from tabulator import Stream
with Stream('tmp/issue338.tsv', headers=1, format='csv', skip_rows=['#'], delimiter='\t') as stream:
print(stream.headers)
print(stream.read())
# ['Lat', 'Lon']
# [['33.6062', '-117.9312'], ['33.6062', '-117.9312'], ['33.6062', '-117.9312']]
So the original issue (not the workaround) would be reproduced as such (using format tsv):
from tabulator import Stream
with Stream('tmp/issue338.tsv', headers=1, format='tsv', skip_rows=['#']) as stream:
print(stream.headers)
print(stream.read())
I'm unable to reproduce my own issue with the ""\t"
with dataflows and standard load processor, but I think this bug still exists (with the tsv processor).
The 1 character string issue might be an issue with me upgrading to python 3.8 or something...
Looking at the docs it actually does specify thaet it should be a 1 character string
https://docs.python.org/3/library/csv.html#csv.Dialect.delimiter
@cschloer I see. The underlying TSV library is not really developed so I think we need to switch TSV to Python CSV parsing. For now, I would recommend using csv
format.
Just to follow back on this, I realized that some front end library I was using was changing "\t" to "\\t" before making the request to the server. Just a note that \t is now working, but it is still not possible to delimit on a mulitcharacter string.
Overview
test.py
with file test.tsv:
I get the error:
It seems like the TSV parser strictly sets the the number of fields allowed when it is initialized (https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/parsers/tsv.py#L63). Since the first item in this file is a comment with no tabs, it errors when a line shows up with a seemingly larger number of fields.
I would fall back to just using the CSV module and use
\t
as the delimiter (https://stackoverflow.com/questions/42358259/how-to-parse-tsv-file-with-python) but I keep getting the error"delimiter" must be a 1-character string
- not sure if that a result of custom code or not.Please preserve this line to notify @roll (lead of this repository)