HoloClean / holoclean

A Machine Learning System for Data Enrichment.
http://www.holoclean.io
Apache License 2.0
518 stars 130 forks source link

Holoclean error on one dataset #117

Open DaliGaoudi opened 4 months ago

DaliGaoudi commented 4 months ago

Hello everyone, I am getting this error when running Holoclean on a dataset: below is a copy of the console log. What is the error and how to fix it please

03:34:40 - [DEBUG] - Starting to execute query SELECT t1.tid FROM "eudract" as t1 WHERE EXISTS (SELECT t2.tid FROM "eudract" as t2 WHERE t1."eudract_number"=t2."eudract_number" AND t1."double_blind"<>t2."double_blind") with id 18 03:34:40 - [DEBUG] - Time to execute query with id 18: 0.00 secs 03:34:40 - [DEBUG] - Time to execute 19 queries: 0.04 secs 03:34:40 - [DEBUG] - DONE with Error Detector: ViolationDetector in 0.05 secs Traceback (most recent call last): File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/frame.py", line 3424, in _ensure_valid_index value = Series(value) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/series.py", line 264, in init data = SingleBlockManager(data, index, fastpath=True) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1481, in init block = make_block(block, placement=slice(0, len(axis)), ndim=1) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3095, in make_block return klass(values, ndim=ndim, placement=placement) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2631, in init placement=placement) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 87, in init '{mgr}'.format(val=len(self.values), mgr=len(self.mgr_locs))) ValueError: Wrong number of items passed 2, placement implies 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "eudract.py", line 37, in hc.detect_errors(detectors) File "/home/dalig/Documents/BPC/HoloClean/holoclean/holoclean.py", line 308, in detect_errors status, detect_time = self.detect_engine.detect_errors(detect_list) File "/home/dalig/Documents/BPC/HoloClean/holoclean/detect/detect.py", line 36, in detect_errors errors_df['cid'] = errors_df.apply(lambda x: self.ds.get_cell_id(x['tid'], x['attribute']), axis=1) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/frame.py", line 3370, in setitem self._set_item(key, value) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/frame.py", line 3444, in _set_item self._ensure_valid_index(value) File "/home/dalig/Documents/BPC/HoloClean/holoclean/hc36/lib/python3.7/site-packages/pandas/core/frame.py", line 3426, in _ensure_valid_index raise ValueError('Cannot set a frame with no defined index ' ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series