AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Get TypeError: object of type 'NoneType' has no len() when read files #117

Open lijie97 opened 3 years ago

lijie97 commented 3 years ago

I had used it before, and I reran it and got this type of error.

Try it on colab and kaggle.

It doesn't work even with this simple set.

paths = [path + "train_set.csv", path + "pred_set.csv"] 
data = Reader(sep=",").train_test_split(paths, "flag")
data = Drift_thresholder().fit_transform(data)

train_set.csv

flag,1,2,3
1,4,-2,9
-1,0,-3,0
-1,-6,2,-1
-1,0,0,-4
-1,-9,-6,6
-1,-7,3,2
1,1,9,4
-1,5,-6,-3
1,8,-9,4
-1,-6,0,4
1,1,-5,7

pred_set.csv

1,2,3
0,9,0
-9,5,0
1,-5,2
-5,-6,-6
-8,2,-3
-6,6,2
5,-6,-4
-5,0,0
8,6,-3
-3,7,-7
-2,-5,-1
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/usr/local/lib/python3.6/dist-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/dist-packages/joblib/_parallel_backends.py", line 608, in __call__
    return self.func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/joblib/parallel.py", line 256, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.6/dist-packages/joblib/parallel.py", line 256, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.6/dist-packages/mlbox/preprocessing/reader.py", line 34, in convert_list
    if (serie.apply(lambda x: type(x) == list).sum() > 0):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 4014, in apply
    if len(self) == 0:
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 727, in __len__
    return len(self._data)
TypeError: object of type 'NoneType' has no len()
"""

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)

<ipython-input-10-14807383dd3c> in <module>()
      1 paths = [path + "train_set.csv", path + "pred_set.csv"]
----> 2 data = Reader(sep=",").train_test_split(paths, "flag")
      3 data = Drift_thresholder().fit_transform(data)
      4 data

6 frames

/usr/local/lib/python3.6/dist-packages/mlbox/preprocessing/reader.py in train_test_split(self, Lpath, target_name)
    361                 # Reading each file
    362 
--> 363                 df = self.clean(path, drop_duplicate=False)
    364 
    365                 # Checking if the target exists to split into test and train

/usr/local/lib/python3.6/dist-packages/mlbox/preprocessing/reader.py in clean(self, path, drop_duplicate)
    283             df = pd.concat([convert_float_and_dates(df[col]) for col in df.columns], axis=1)
    284         else:
--> 285             df = pd.concat(Parallel(n_jobs=-1)(delayed(convert_list)(df[col]) for col in df.columns),
    286                            axis=1)
    287 

/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in __call__(self, iterable)
   1015 
   1016             with self._backend.retrieval_context():
-> 1017                 self.retrieve()
   1018             # Make sure that we get a last message telling us we are done
   1019             elapsed_time = time.time() - self._start_time

/usr/local/lib/python3.6/dist-packages/joblib/parallel.py in retrieve(self)
    907             try:
    908                 if getattr(self._backend, 'supports_timeout', False):
--> 909                     self._output.extend(job.get(timeout=self.timeout))
    910                 else:
    911                     self._output.extend(job.get())

/usr/local/lib/python3.6/dist-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    560         AsyncResults.get from multiprocessing."""
    561         try:
--> 562             return future.result(timeout=timeout)
    563         except LokyTimeoutError:
    564             raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TypeError: object of type 'NoneType' has no len()
lijie97 commented 3 years ago

update : Even the examples in this repo cannot be run

AxeldeRomblay commented 3 years ago

Thank you @lijie97, I will have a look and fix it

AxeldeRomblay commented 3 years ago

Sounds weird... Can you please try to rename your columns with strings (instead of 1,2,3 ...) ? or use another dataset please ? Because on my side it works on colab using california housing dataset...

lijie97 commented 3 years ago

Sounds weird... Can you please try to rename your columns with strings (instead of 1,2,3 ...) ? or use another dataset please ? Because on my side it works on colab using california housing dataset...

Yes, I have a try also by the code in Example, but it seems that does not work as the same reason.

nikkisingh111333 commented 2 years ago

Hello there i m having the exact same issue here...i have tried 3 dataset of house prediction all getting the same error TypeError: object of type 'NoneType' has no len() how to fix this or this is the production Bug...i have spents hours trying to figure it out but no Luck..

viveks-codes commented 1 year ago

Hey Everyone, If you are on colab and you have just installed MLbox and you got following error that says TypeError: object of type 'NoneType' has no len() try following solution :

Runtime -> Restart and Run All

Hope This Helps : ) :+1: Thanks,

Vivek Patel