Closed fccoelho closed 9 years ago
@fccoelho Are you running against master?
@fccoelho Can you post the traceback?
@fccoelho Can you also show the first few lines of your CSV or ping me with the file?
I am running the version on PyPI (0.3.2) with Python 3
the head of the file looks like this:
"COD_PROC","NUM_SEQ","COD_TIP_RELAC","COMPL","COD_ASSUNTO" "1958.001.500131-1A",1,,"",899 "1958.001.500156-6",1,,"",899 "1958.001.500162-1",1,,"",899 "1958.001.500204-2",1,,"",899 "1958.001.500204-2A",1,,"",899 "1958.001.500204-2B",1,,"",899 "1958.001.500223-6",1,,"",9610 "1958.001.500233-9",1,,"",4703 "1909.017.000018-3",1,30,"sumaria",899
I "upgraded" to master and now I am hitting an UnicodeDecodeError which I didn't have before. My csv is in ISO-8859-1.
c = odo.CSV(tables[0], encoding="iso-8859-1", has_header=True, sep=',')
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-14-35b3fa82c8f3> in <module>()
----> 1 c= odo.CSV(tables[0], encoding="iso-8859-1", has_header=True, sep=',')
/usr/local/lib/python3.4/dist-packages/odo/backends/csv.py in __init__(self, path, has_header, encoding, sniff_nbytes, **kwargs)
101 self.has_header = has_header
102 self.encoding = encoding
--> 103 kwargs = merge(sniff_dialect(path, sniff_nbytes),
104 keymap(alias, kwargs))
105 self.dialect = valfilter(bool,
/usr/local/lib/python3.4/dist-packages/odo/backends/csv.py in sniff_dialect(path, nbytes, encoding)
64 with open_file(path, 'rb') as f:
65 raw = f.read(nbytes)
---> 66 dialect = csv.Sniffer().sniff(raw.decode(encoding))
67 dialect.lineterminator = '\r\n' if b'\r\n' in raw else '\n'
68 return dialect_to_dict(dialect)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 114: invalid continuation byte
hm ok, marked as a bug. fix coming tomorrow
looks like it's ignoring your encoding, which is also a bug
I switched back to 0.3.2, and I don't seem to get the old error anymore, now I got this, which I think is related to a problem in my csv:
/usr/local/lib/python3.4/dist-packages/odo/backends/sql_csv.py in execute_copy_all(dialect, engine, statement)
162 conn = engine.raw_connection()
163 cursor = conn.cursor()
--> 164 cursor.execute(statement)
165 conn.commit()
166 conn.close()
OperationalError: out of memory
DETAIL: String of 472261105 bytes is too long for encoding conversion.
CONTEXT: COPY acaoprocesso, line 16088860
@fccoelho Can you send me the lines from the actual file that you posted on github? I want to make sure I'm testing the encoding properly. Thanks
my email: github handle @ google's mail service
This particular CSV is huge, I'll try to cook up a csv which triggers the same error.
On Thu, May 21, 2015 at 11:55 AM, Phillip Cloud notifications@github.com wrote:
@fccoelho https://github.com/fccoelho Can you send me the lines from the actual file that you posted on github? I want to make sure I'm testing the encoding properly. Thanks
my email: github handle @ google's mail service
— Reply to this email directly or view it on GitHub https://github.com/ContinuumIO/odo/issues/201#issuecomment-104308148.
+55(21) 3799-5551 Professor Escola de Matemática Aplicada Fundação Getulio Vargas Praia de Botafogo, 190 sala 312 Rio de Janeiro - RJ 22250-900 Brasil
that's ok, i just wanted the lines from the file that you pasted here, something like this:
head -n 10 original_file.csv > send_this_file.csv
@fccoelho is this still an issue for you?
No. Thanks for looking into it.
great! thanks.
When I Try to convert straight from CSV to PostgreSQL:
I get an OperationError saying that odo can't open the file.
But if I first do a conversion from CSV to dataframe and then to PostgreSQL, all works perfectly, but is too darn slow.
any workarounds?