Closed elsander closed 5 years ago
Do you have a minimal working example of this failure? The client is relying on Platform to auto-detect headers, so there might be special conditions when it fails. This code below worked for me:
import pandas as pd
from civis.io import dataframe_to_civis, read_civis
table = TABLE
database = DATABASE
df = pd.DataFrame({'a': [1,2,3], 'b': ['hi', 'there', 'world']})
dataframe_to_civis(df, table=table, database=database).result()
df2 = read_civis(table=table, database=database, use_pandas=True)
assert all(df.columns == df2.columns)
Here's the code I ran where I got this result:
In [1]: import civis
In [2]: import pandas as pd
In [3]: reg = pd.read_csv('region_state.csv', header=0)
In [4]: reg
Out[4]:
region state
0 West AK
1 West HI
2 West WA
3 West OR
4 West CA
5 West NV
6 West ID
7 West MT
8 West WY
9 West UT
10 West CO
11 West AZ
12 West NM
13 Midwest ND
14 Midwest SD
15 Midwest NE
16 Midwest KS
17 Midwest MN
18 Midwest IA
19 Midwest MO
20 Midwest WI
21 Midwest IL
22 Midwest IN
23 Midwest MI
24 Midwest OH
25 South TX
26 South OK
27 South AR
28 South LA
29 South MS
30 South AL
31 South TN
32 South KY
33 South GA
34 South FL
35 South SC
36 South NC
37 South VA
38 South MD
39 South DE
40 South WV
41 South DC
42 Northeast PA
43 Northeast NJ
44 Northeast CT
45 Northeast RI
46 Northeast NY
47 Northeast MA
48 Northeast VT
49 Northeast NH
50 Northeast ME
51 South NC-SC
52 South WV-OH
53 South WV-KY
54 Midwest ND-MN
55 Midwest NE-IA
56 Midwest IA-IL
In [5]: civis.io.dataframe_to_civis(reg, database='xyz', table='test.test')
Out[5]: <CivisFuture at 0x112fd5438 state=running>
FWIW, I successfully ran this before. The new things in this run vs old successful run are the multi-state rows at the bottom, and possibly a different version of the civis client? I think I did run this successfully on 1.9.0 before though.
Let me know if you have trouble reproducing, and I can send you a copy of the file.
I think that regardless of whether the import is buggy or not, we should deprecate the headers
parameter of dataframe_to_civis
and always pass True
in the civis_file_to_table
call. A user modifying that parameter would either have no effect or cause the import to break.
Let me know if you have trouble reproducing, and I can send you a copy of the file.
Thanks, that'd be helpful. The example above worked for me on v1.9.0
.
I think that regardless of whether the import is buggy or not, we should deprecate the headers parameter
That seems reasonable to me.
I used
dataframe_to_civis
on a dataframe with column names. The resulting table on Redshift added the header as an additional row in the table, and gave the table deafult column names (column_0
,column_1
). I think the expected behavior would be for the function to auto-detect the header and add it as the column names in the table. The real bug here is the fact that the function writes the header to file by default, but does not enforce the header being read as column names in Platform. In case it's relevant, the R client did automatically handle the header correctly for this same dataset, so there's also a discrepancy between the API clients.