HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
518 stars 130 forks source link

Replace to_sql() with copy_expert() to improve performance while savi… #101

Open fatangare opened 5 years ago

fatangare commented 5 years ago

Improving performance while saving data in Postgresql in case of large dataset.

to_sql() method is slow and takes times to save data in Postgres. It is replaced with copy_exprt() to save data in Postgres tables fast.

fatangare commented 5 years ago

I also added one commit to parallelize compute_norm_cond_entropy_corr() method.

With single thread on hospital data, it takes 3.11 sec. With 6 threads, it takes 1.05 sec on my mac (16GB RAM)