Ji-Zhang / datacleanbot

MIT License
8 stars 1 forks source link

Cannot import package due to sklearn dependency #10

Open kellieotto opened 4 years ago

kellieotto commented 4 years ago

This issue is part of your JOSS review.

I was able to install the package but can't import it. I get the following error

ImportError: cannot import name 'Imputer' from 'sklearn.preprocessing' (/opt/conda/envs/py3-primary/lib/python3.7/site-packages/sklearn/preprocessing/__init__.py)

seems related to this? https://stackoverflow.com/questions/59439096/importerror-cannnot-import-name-imputer-from-sklearn-preprocessing

Ji-Zhang commented 4 years ago

Hi @kellieotto , this bug has been fixed. Please let me know if other problems. Thanks.

kellieotto commented 4 years ago

@Ji-Zhang That error seems to be fixed, great! I'm still running into import issues.

ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Seems to be coming from tensorflow.

Ji-Zhang commented 4 years ago

Hi @kellieotto, may I ask in which environment you are using the TensorFlow? There is not much I can do in the package to fix this.

If you are using TensorFlow with GPU, you need to install CUDA and cuDNN. Please follow instructions on https://www.tensorflow.org/install/

If you have already install CUDA and cuDNN, but still get this error, then you probably forgot to export your libraries: for Linux, you may need to set LD_LIBRARY_PATH to include CUDA libraries.

If the above can not fix the problem, please let me know so I can further help. Thanks.

kellieotto commented 4 years ago

Hi @Ji-Zhang, sorry for the huge time delay between responses on this.

I have installed everything now. I am running the code example you have in the README and get this error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-15-b513b18ad249> in <module>
      1 import datacleanbot.dataclean as dc
----> 2 Xy = dc.autoclean(Xy, data.name, features)

~/miniconda3/lib/python3.7/site-packages/datacleanbot/dataclean.py in autoclean(Xy, dataset_name, features)
   1345     features = unify_name_consistency(features)
   1346     features_new, Xy_filled = handle_missing(features, Xy)
-> 1347     Xy_cleaned = handle_outlier(features_new, Xy_filled)
   1348     return Xy_cleaned

~/miniconda3/lib/python3.7/site-packages/datacleanbot/dataclean.py in handle_outlier(features, Xy)
   1286     X = Xy[:,:-1]
   1287     y = Xy[:,-1]
-> 1288     best = predict_best_anomaly_algorithm(X, y)
   1289     df = pd.DataFrame(Xy)
   1290     display(HTML('<h4>Visualize Outliers ... </h4>'))

~/miniconda3/lib/python3.7/site-packages/datacleanbot/dataclean.py in predict_best_anomaly_algorithm(X, y)
   1050 
   1051     # load meta learner
-> 1052     metalearner = joblib.load(urlopen("https://github.com/Ji-Zhang/datacleanbot/blob/master/process/AutomaticOutlierDetection/metalearner_rf.pkl?raw=true"))
   1053     best_anomaly_algorithm = metalearner.predict(mf)
   1054     if best_anomaly_algorithm[0] == 0:

~/miniconda3/lib/python3.7/site-packages/sklearn/externals/joblib/numpy_pickle.py in load(filename, mmap_mode)
    586         filename = getattr(fobj, 'name', '')
    587         with _read_fileobject(fobj, filename, mmap_mode) as fobj:
--> 588             obj = _unpickle(fobj)
    589     else:
    590         with open(filename, 'rb') as f:

~/miniconda3/lib/python3.7/site-packages/sklearn/externals/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
    524     obj = None
    525     try:
--> 526         obj = unpickler.load()
    527         if unpickler.compat_mode:
    528             warnings.warn("The file '%s' has been generated with a "

~/miniconda3/lib/python3.7/pickle.py in load(self)
   1083                     raise EOFError
   1084                 assert isinstance(key, bytes_types)
-> 1085                 dispatch[key[0]](self)
   1086         except _Stop as stopinst:
   1087             return stopinst.value

~/miniconda3/lib/python3.7/pickle.py in load_global(self)
   1371         module = self.readline()[:-1].decode("utf-8")
   1372         name = self.readline()[:-1].decode("utf-8")
-> 1373         klass = self.find_class(module, name)
   1374         self.append(klass)
   1375     dispatch[GLOBAL[0]] = load_global

~/miniconda3/lib/python3.7/pickle.py in find_class(self, module, name)
   1421             elif module in _compat_pickle.IMPORT_MAPPING:
   1422                 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1423         __import__(module, level=0)
   1424         if self.proto >= 4:
   1425             return _getattribute(sys.modules[module], name)[0]

ModuleNotFoundError: No module named 'sklearn.ensemble._forest'

I think it's related to package versions + pickling. I found this issue that seems related.

Ji-Zhang commented 4 years ago

Hi @kellieotto , this bug should be fixed now. Could you please test it again? Thanks in advance.

kellieotto commented 4 years ago

Sorry @Ji-Zhang it's still not working when I run dc.autoclean. I see Important Features, Statistical Information, Discover Data Types, etc... but when it gets to Outliers, the error posted above appears.

I did ran pip install datacleanbot==0.8 and pip install joblib but I still get the error posted above.

Ji-Zhang commented 4 years ago

Hi @kellieotto , sorry for the inconvenience. I changed the way to load the trained model. Could you please try it again? pip install datacleanbot==0.9