HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
514 stars 129 forks source link

Using HoloClean for creating labels on tabular numerical datasets #45

Closed asstergi closed 4 years ago

asstergi commented 5 years ago

@thodrek Following up on this issue from snorkel (https://github.com/HazyResearch/snorkel/issues/803), I was wondering if there are any examples on how I can use HoloClean to create labels for tabular numerical datasets with the help of labelling functions.

Any guidance would be really appreciated.

asstergi commented 5 years ago

@thodrek could you please provide any guidance on the above question?

jondoering commented 5 years ago

Any update on that? Would be really interested, too.

DataDoctorNG commented 5 years ago

@thodrek Any update on using HoloClean for tabular data with examples? I have a project I would like to do with HoloClean and would be greatly interested in some examples on how to use it.

thodrek commented 5 years ago

We are preparing a release that handles mixed categorical and numerical. It’s in dev currently and soon to be pushed in master.

thodrek commented 4 years ago

The latest version on master handles both continuous and discrete values.

asstergi commented 4 years ago

@thodrek Thank you for the reply.

One more question though. How should we approach a data labeling problem? If my understanding is correct, the initial value of a sample's label is kind of a 'prior' (in a loose sense) to its final value (after error correction). Is that correct?

If so, how should I initially set the values of the labels? Could I just set them to the same value and let HoloClean sort this out utilizing the constraints?