-
I just test a toy code in spark 2.1.1. Then it report:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(vFeatures)' due to data type mismatch: argument 1 requi…
-
Currently the MDLP code runs fine with spark 1.6.x, but changes are needed in order for it to work using 2.x spark. Probably there needs to be a seprate versioned release of MDLP for spark 2.x.
-
I noticed that we see the following exception when trying to run MDLP with a date label specified.
I will add a unit test and try to debug.
`java.util.NoSuchElementException: nullscala.collection.…
-
Currently feature selection and MDL techniques are built using spark 1.6.1. Is there plan to support spark 2.0.0 or latest versions?.
-
With the new version of arules (1.5-0) I am sometimes getting error "reached CPU time limit".
In my evaluations on 27 UCI datasets, I get it on waveform-5000 (MDLP-discretized) with the followin…
-
@hlin117
I am using MDLP transformer to get discretize values of a continuous variable. But I am getting MDLP output as Empty array. Below are data attributes as
E.g.
mdlp = MDLP()
mdlp.fit_transfo…
-
I hope that @sramirez or someone else familiar spark discretizers can tell me if this is a bug.
Other discretizers produce splits that have an initial cutpoint of -Infinity, and a final cutpoint of In…
-
I think we should consider adding a param to limit the minimum number of instances in a bucket. I have seen cases where there is one huge bucket with most of the data - like shown in this spinogram
![…
-
It would be nice to add some unit tests that would prove that it works as expected and act as documentation by way of examples. I will try to add some and do a PR.
-
Currently the stopping criteria appears to be quite conservative.
For example, if I set maxBins to 1000, it will usually just generate 2 - 5 bins. Why doesn't it generate 8 or 10 instead? I think ther…