d6t / d6tstack

Quickly ingest messy CSV and XLS files. Export to clean pandas, SQL, parquet
MIT License
195 stars 45 forks source link

Upgraded to 0.14, still same error #5

Closed danielpoon closed 5 years ago

danielpoon commented 5 years ago

I see that you changed the separator from tab. Is there something else?

Originally posted by @danielpoon in https://github.com/d6t/d6tstack/issues/4#issuecomment-448243092

d6tdev commented 5 years ago

did you install from github? haven't updated the pypi version yet. pip install git+https://github.com/d6t/d6tstack.git -U --no-deps

danielpoon commented 5 years ago

python3 -m pip install git+https://github.com/d6t/d6tstack.git -U --no-deps

Collecting git+https://github.com/d6t/d6tstack.git Cloning https://github.com/d6t/d6tstack.git to /private/var/folders/rn/gqmf067x2d79tywhfxqc02580000gn/T/pip-req-build-anj4jwv4 Building wheels for collected packages: d6tstack Running setup.py bdist_wheel for d6tstack ... done Stored in directory: /private/var/folders/rn/gqmf067x2d79tywhfxqc02580000gn/T/pip-ephem-wheel-cache-15kmdn1l/wheels/aa/7f/1c/45e6697d3af05ba5c6f9988a919bd1cf80c0ccfe74b07109c7 Successfully built d6tstack Installing collected packages: d6tstack Found existing installation: d6tstack 0.1.4 Uninstalling d6tstack-0.1.4: Successfully uninstalled d6tstack-0.1.4 Successfully installed d6tstack-0.1.4

d6tdev commented 5 years ago

below works for me, work for you?


import yaml
config = yaml.load(open('.test-cred.yaml'))
cfg_uri_psql = config['wlo'] # use your own

import pandas as pd
df = pd.DataFrame({'a':range(10),'b':range(10),'name':['name,first name']*10})

import d6tstack.utils
d6tstack.utils.pd_to_psql(df,cfg_uri_psql,'quick',sep='\t',if_exists='replace')

import sqlalchemy
sqlengine = sqlalchemy.create_engine(cfg_uri_psql)
print(pd.read_sql_table('quick',sqlengine))

print(pd.read_sql_table('quick',sqlengine)) a b name 0 0 0 name,first name 1 1 1 name,first name 2 2 2 name,first name

$ pip freeze d6tstack==0.1.4

danielpoon commented 5 years ago

DataError Traceback (most recent call last)

in ----> 5 d6tstack.utils.pd_to_psql(df, fast_engine, 'xxx', 'xxx', if_exists='append') /usr/local/lib/python3.7/site-packages/d6tstack/utils.py in pd_to_psql(df, uri, table_name, schema_name, if_exists, sep) 96 df.to_csv(fbuf, index=False, header=False, sep=sep) 97 fbuf.seek(0) ---> 98 cursor.copy_from(fbuf, table_name, sep=sep, null='') 99 sql_cnxn.commit() 100 cursor.close() DataError: extra data after last expected column
danielpoon commented 5 years ago

Hey I appreciated your example. I found my problem.

d6tstack.utils.pd_to_psql(df,cfg_uri_psql,'quick',sep='\t',if_exists='replace')

I did not see that sep = '\t' is a parameter that I need. But once I put it in, it worked. Case closed and it's blazingly fast. Now I could really use this. Thanks so much!!! You're a champ.