HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Store domain as list/array in memory and in Postgres. Remove domain_size column. #76

Open richardwu opened 5 years ago

richardwu commented 5 years ago

We have issues with datasets that contain the symbol | since we use ||| as a separator for the list of domain values.

We should be keeping the list of domains as a collection in memory and in Postgres. Rather than do our own array serialization we Pandas + Postgres should handle the array serialization into Postgres.