eubr-bigsea / citrus

Apache License 2.0
2 stars 2 forks source link

Check what is included into One vs. rest classifier #82

Open alexgcsa opened 5 years ago

alexgcsa commented 5 years ago

Currently, it is not working.

It would be better if binary classifiers (e.g., the current version of Support Vector Machines) include this as a subroutine to support multiclass problems.

waltersf commented 5 years ago

Include this in classifiers brings the same problem with other features added to them: classifiers are now responsible for a large number of different tasks (train, apply, cross validate, etc).

waltersf commented 5 years ago

I'll try to fix the current implementation

waltersf commented 5 years ago

Oh, man, there is no way to fix it besides the way you mentioned. It used to require a algorithm that is not available anymore.

waltersf commented 5 years ago

Changed the behavior. Now there is an option in classification algorithms to enable this. Please, test different situations, such as multi label classifiers, input data with exactly 2 classes and more classes, etc.

alexgcsa commented 5 years ago

It has not fully worked yet. Testing here GBT with one-vs-all turned on for the dataset Iris.

id: 544 tested on: https://lemonade.ctweb.inweb.org.br

It returns an error:

_Detalhe do erro (avançado) Traceback (most recent call last): File "/usr/local/juicer/juicer/spark/spark_minion.py", line 460, in _perform_execute self._emit_event(room=job_id, namespace='/stand')) File "/tmp/juicer_app_544_544_2699.py", line 758, in main task_futures['b7ca4c66-0a2c-44a1-b96e-6ddd7cb7c298'].result() File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 462, in result return self.get_result() File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/thread.py", line 63, in run result = self.fn(*self.args, *self.kwargs) File "/tmp/juicer_app_544_544_2699.py", line 757, in lambda: evaluate_model_5(spark_session, cached_state, emit_event)) File "/tmp/juicer_app_544_544_2699.py", line 496, in evaluate_model_5 parent_result = task_futures['7e9b90d9-9e2a-4af0-b472-eb9d878f1c0f'].result() File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 462, in result return self.__get_result() File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/thread.py", line 63, in run result = self.fn(self.args, **self.kwargs) File "/tmp/juicer_app_544_544_2699.py", line 489, in lambda: apply_model_4(spark_session, cached_state, emit_event)) File "/tmp/juicer_app_544_544_2699.py", line 438, in apply_model_4 parent_result = task_futures['531e4ad5-96be-4343-a46c-f5be3fae6ffd'].result() File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/_base.py", line 462, in result return self.get_result() File "/usr/local/lib/python2.7/dist-packages/concurrent/futures/thread.py", line 63, in run result = self.fn(*self.args, **self.kwargs) File "/tmp/juicer_app_544_544_2699.py", line 494, in lambda: gbt_classifier_model_3(spark_session, cached_state, emit_event)) File "/tmp/juicer_app_544_544_2699.py", line 351, in gbt_classifier_model_3 model2 = pipeline.fit(s11) File "/usr/local/spark/python/pyspark/ml/base.py", line 132, in fit return self._fit(dataset) File "/usr/local/spark/python/pyspark/ml/pipeline.py", line 109, in _fit model = stage.fit(dataset) File "/usr/local/spark/python/pyspark/ml/base.py", line 132, in fit return self._fit(dataset) File "/usr/local/spark/python/pyspark/ml/classification.py", line 1802, in fit "Classifier %s doesn't extend from HasRawPredictionCol." % type(classifier) AssertionError: Classifier <class 'pyspark.ml.classification.GBTClassifier'> doesn't extend from HasRawPredictionCol.

alexgcsa commented 5 years ago

It worked well for linear SVM classifier, but not for GBT.

SVM workflow: 546


I also tested all other multi-class classifiers (Decision tree, Logistic regression, Naïve Bayes, Multi-layer perceptron and Random Forest) with one-vs-rest activated. All worked fine.


The only issue is on GBT.

alexgcsa commented 5 years ago

Tested. Waiting for new updates. Specifically, GBT classifier is not working.

@zilton

raghuvarranvh commented 4 years ago

@alexgcsa - Hi , Any updates on this issue.? I am using spark2.4.0 and am getting the same error as you.