blaze / odo

Data Migration for the Blaze Project
http://odo.readthedocs.org/
BSD 3-Clause "New" or "Revised" License
1k stars 138 forks source link

CSV to Postrgresql fails #201

Closed fccoelho closed 9 years ago

fccoelho commented 9 years ago

When I Try to convert straight from CSV to PostgreSQL:

odo.odo("my.csv", "postgresql://user:pwd@localhost/database::table")

I get an OperationError saying that odo can't open the file.

But if I first do a conversion from CSV to dataframe and then to PostgreSQL, all works perfectly, but is too darn slow.

any workarounds?

cpcloud commented 9 years ago

@fccoelho Are you running against master?

cpcloud commented 9 years ago

@fccoelho Can you post the traceback?

cpcloud commented 9 years ago

@fccoelho Can you also show the first few lines of your CSV or ping me with the file?

fccoelho commented 9 years ago

I am running the version on PyPI (0.3.2) with Python 3

the head of the file looks like this:

"COD_PROC","NUM_SEQ","COD_TIP_RELAC","COMPL","COD_ASSUNTO" "1958.001.500131-1A",1,,"",899 "1958.001.500156-6",1,,"",899 "1958.001.500162-1",1,,"",899 "1958.001.500204-2",1,,"",899 "1958.001.500204-2A",1,,"",899 "1958.001.500204-2B",1,,"",899 "1958.001.500223-6",1,,"",9610 "1958.001.500233-9",1,,"",4703 "1909.017.000018-3",1,30,"sumaria",899

fccoelho commented 9 years ago

I "upgraded" to master and now I am hitting an UnicodeDecodeError which I didn't have before. My csv is in ISO-8859-1.

c = odo.CSV(tables[0], encoding="iso-8859-1", has_header=True, sep=',')
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-14-35b3fa82c8f3> in <module>()
----> 1 c= odo.CSV(tables[0], encoding="iso-8859-1", has_header=True, sep=',')

/usr/local/lib/python3.4/dist-packages/odo/backends/csv.py in __init__(self, path, has_header, encoding, sniff_nbytes, **kwargs)
    101             self.has_header = has_header
    102         self.encoding = encoding
--> 103         kwargs = merge(sniff_dialect(path, sniff_nbytes),
    104                        keymap(alias, kwargs))
    105         self.dialect = valfilter(bool,

/usr/local/lib/python3.4/dist-packages/odo/backends/csv.py in sniff_dialect(path, nbytes, encoding)
     64     with open_file(path, 'rb') as f:
     65         raw = f.read(nbytes)
---> 66     dialect = csv.Sniffer().sniff(raw.decode(encoding))
     67     dialect.lineterminator = '\r\n' if b'\r\n' in raw else '\n'
     68     return dialect_to_dict(dialect)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 114: invalid continuation byte
cpcloud commented 9 years ago

hm ok, marked as a bug. fix coming tomorrow

cpcloud commented 9 years ago

looks like it's ignoring your encoding, which is also a bug

fccoelho commented 9 years ago

I switched back to 0.3.2, and I don't seem to get the old error anymore, now I got this, which I think is related to a problem in my csv:

/usr/local/lib/python3.4/dist-packages/odo/backends/sql_csv.py in execute_copy_all(dialect, engine, statement)
    162     conn = engine.raw_connection()
    163     cursor = conn.cursor()
--> 164     cursor.execute(statement)
    165     conn.commit()
    166     conn.close()

OperationalError: out of memory
DETAIL:  String of 472261105 bytes is too long for encoding conversion.
CONTEXT:  COPY acaoprocesso, line 16088860
cpcloud commented 9 years ago

@fccoelho Can you send me the lines from the actual file that you posted on github? I want to make sure I'm testing the encoding properly. Thanks

my email: github handle @ google's mail service

fccoelho commented 9 years ago

This particular CSV is huge, I'll try to cook up a csv which triggers the same error.

On Thu, May 21, 2015 at 11:55 AM, Phillip Cloud notifications@github.com wrote:

@fccoelho https://github.com/fccoelho Can you send me the lines from the actual file that you posted on github? I want to make sure I'm testing the encoding properly. Thanks

my email: github handle @ google's mail service

— Reply to this email directly or view it on GitHub https://github.com/ContinuumIO/odo/issues/201#issuecomment-104308148.

Flávio Codeço Coelho

+55(21) 3799-5551 Professor Escola de Matemática Aplicada Fundação Getulio Vargas Praia de Botafogo, 190 sala 312 Rio de Janeiro - RJ 22250-900 Brasil

cpcloud commented 9 years ago

that's ok, i just wanted the lines from the file that you pasted here, something like this:

head -n 10 original_file.csv > send_this_file.csv
cpcloud commented 9 years ago

@fccoelho is this still an issue for you?

fccoelho commented 9 years ago

No. Thanks for looking into it.

cpcloud commented 9 years ago

great! thanks.