HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

translation of simple DCs with a constant to SQL queries not working #44

Closed pmaetzig closed 4 years ago

pmaetzig commented 5 years ago

Hi,

when I implement simple DCs that compare the given attribute to a constant, HoloClean does not generate a syntactically correct SQL query from that DC. I reproduced this behaviour on the iris dataset as I can't share my data.

The data

The iris dataset and one single - semantically nonsensical, but syntactically ok - denial constraint that looks like the following:

t1&EQ(t1.sepal_length,"6.9")

The error

Error detection according to the DC works fine (the constraint detects 4 cells which is expected from the data).

However, during the call of repair_errors, the following error occurs:

ProgrammingError: syntax error at or near "AND"
LINE 1: ...= t2._tid_ AND t2.attribute = 'sepal_length' AND  AND t2.rv_...

Which fails to be executed due to the "AND AND". Some digging makes me think that the error occurs during setting up the featurizers, more specifically when the tensor for the dcfeaturizer is created (l.17 in repair/featurize/featurize.py ).

Any ideas on why that might happen; perhaps a faulty template for the SQL queries?