dssg / aequitas

Bias Auditing & Fair ML Toolkit
http://www.datasciencepublicpolicy.org/aequitas/
MIT License
694 stars 114 forks source link

create tables with replace #66

Open alenastern opened 5 years ago

alenastern commented 5 years ago

Issue with writing new rows to table when previous rows in column had null value. Postgres expects double precision when col should be boolean per error message below.

  File "/home/ubuntu/.local/lib/python3.6/site-packages/ohio/ext/pandas.py", line 96, in to_sql_method_pg_copy_to
    cursor.copy_expert(sql, csv_buffer)
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: "True"
CONTEXT:  COPY aequitas_group, line 1, column FOR Parity: "True"
jesteria commented 5 years ago

I don't see the full trace, but I'm guessing this is because aequitas relies on pandas to create the table, and pandas imputes column types?

I would guess the easiest way to ensure these are proper is to pass dtype={…}, e.g.:

data_frame.pg_copy_to(
    'aequitas_group',
    engine,
    dtype={
        'col0': sqlalchemy.Boolean,
        …
    }
)
ksoleman commented 3 years ago

Alternatively, you could remove columns that have null values by doing something like

data_frame.drop(data_frame.columns[[#fill with col numbers that need to be removed]], axis=1, inplace=True)

Another way to remove null values is to use Pandas dataframe dropna( ) function

new_data_frame = data_frame.dropna()

doing this will return a new dataframe with the rows/cols with Null/NaN values removed. More info on pandas dropna() function can be found here