TypeError when performing Auto_NLP for a regression problem

abdul0807 commented 4 years ago

Hi AutoViML community,

Thank you for providing this amazing package.

I am trying my hands on a regression problem and ended up with typeerror. Please note that the same can be replicated in a kaggle kernel. The code and error is shared below for your reference.

Thanks -


nlp_column = 'Product_Information'
target = 'Product_Price'
train_nlp, test_nlp, nlp_transformer, preds = Auto_NLP(
                nlp_column, train, test, target, score_type='neg_mean_squared_error',
                modeltype='Regression',top_num_features=50, verbose=2,
                build_model=True)

error:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-1ec6fbec18bb> in <module>
      4                 nlp_column, train, test, target, score_type='neg_mean_squared_error',
      5                 modeltype='Regression',top_num_features=50, verbose=2,
----> 6                 build_model=True)

/opt/conda/lib/python3.7/site-packages/autoviml/Auto_NLP.py in Auto_NLP(nlp_column, train, test, target, score_type, modeltype, top_num_features, verbose, build_model)
   1197         gs = RandomizedSearchCV(nlp_model,params, n_iter=10, cv=scv,
   1198                                 scoring=score_type, random_state=seed)
-> 1199         gs.fit(X_train_dtm,y_train)
   1200         y_pred = gs.predict(X_test_dtm)
   1201         ##### Print the model results on Cross Validation data set (held out)

/opt/conda/lib/python3.7/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    737             refit_start_time = time.time()
    738             if y is not None:
--> 739                 self.best_estimator_.fit(X, y, **fit_params)
    740             else:
    741                 self.best_estimator_.fit(X, **fit_params)

/opt/conda/lib/python3.7/site-packages/sklearn/linear_model/_least_angle.py in fit(self, X, y, Xy)
    955             returns an instance of self.
    956         """
--> 957         X, y = check_X_y(X, y, y_numeric=True, multi_output=True)
    958 
    959         alpha = getattr(self, 'alpha', 0.)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    753                     ensure_min_features=ensure_min_features,
    754                     warn_on_dtype=warn_on_dtype,
--> 755                     estimator=estimator)
    756     if multi_output:
    757         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    509                                       dtype=dtype, copy=copy,
    510                                       force_all_finite=force_all_finite,
--> 511                                       accept_large_sparse=accept_large_sparse)
    512     else:
    513         # If np.array(..) gives ComplexWarning, then we convert the warning

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
    304 
    305     if accept_sparse is False:
--> 306         raise TypeError('A sparse matrix was passed, but dense '
    307                         'data is required. Use X.toarray() to '
    308                         'convert to a dense numpy array.')

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

AutoViML commented 4 years ago

Hi Abdul: Thanks for using Auto_NLP on your data set. I have found the problem and fixed it in the Github. You should see a message "Modified Sparse array to dense array after vectorizer since it was erroring" in the GitHub related to Auto_NLP which means it is the modified version. In order to use, you must pip install from the git version as follows in your Jupyter Notebook or wherever:

!python3 -m pip install git+https://github.com/AutoViML/Auto_ViML.git

If you have any further error, please let me know. Thanks

abdul0807 commented 4 years ago

@AutoViML Thanks a lot for the prompt reply. Looks like the issue still exist and this time the error is found at a different line number. Please check the error below for details.

I saw the code and looks like the issue is because of using LassoLars model. We can actually use other models like Lasso, Ridge etc. Or we might have to add an extra class to the pipeline which convert sparse array to dense array.

error


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-34b02f496bd4> in <module>
      4                 nlp_column, train, test, target, score_type='mean_squared_error',
      5                 modeltype='Regression',top_num_features=50, verbose=2,
----> 6                 build_model=True)

/opt/conda/lib/python3.7/site-packages/autoviml/Auto_NLP.py in Auto_NLP(nlp_column, train, test, target, score_type, modeltype, top_num_features, verbose, build_model)
   1217         #####  Now AFTER TRAINING, make predictions on the given test data set!
   1218         start_time = time.time()
-> 1219         pipe.fit(X,y)
   1220         print('Training completed. Time taken for Auto_NLP = %0.1f minutes' %((time.time()-start_time4)/60))
   1221         print('#########          A U T O   N L P  C O M P L E T E D    ###############################')

/opt/conda/lib/python3.7/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    352                                  self._log_message(len(self.steps) - 1)):
    353             if self._final_estimator != 'passthrough':
--> 354                 self._final_estimator.fit(Xt, y, **fit_params)
    355         return self
    356 

/opt/conda/lib/python3.7/site-packages/sklearn/linear_model/_least_angle.py in fit(self, X, y, Xy)
    955             returns an instance of self.
    956         """
--> 957         X, y = check_X_y(X, y, y_numeric=True, multi_output=True)
    958 
    959         alpha = getattr(self, 'alpha', 0.)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    753                     ensure_min_features=ensure_min_features,
    754                     warn_on_dtype=warn_on_dtype,
--> 755                     estimator=estimator)
    756     if multi_output:
    757         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    509                                       dtype=dtype, copy=copy,
    510                                       force_all_finite=force_all_finite,
--> 511                                       accept_large_sparse=accept_large_sparse)
    512     else:
    513         # If np.array(..) gives ComplexWarning, then we convert the warning

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
    304 
    305     if accept_sparse is False:
--> 306         raise TypeError('A sparse matrix was passed, but dense '
    307                         'data is required. Use X.toarray() to '
    308                         'convert to a dense numpy array.')

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

AutoViML commented 4 years ago

Ok you are right. I changed the model from LassoLARS to Linear SVR. It works faster and better. It should be good to go. Test it and let me know. I will keep the issue open until you confirm it works.

abdul0807 commented 4 years ago

Thanks! This is working. Please close the issue.

AutoViML commented 4 years ago

Ok great all the best!

AutoViML / Auto_ViML

TypeError when performing Auto_NLP for a regression problem #14