The spark feature org.apache.spark.ml.feature.OneHotEncoderModel has two mixins for the input columns: inputCol and inputCols. We need to check which param is set and use that correct one to compute categorySizes.
Tests pass locally:
$ sbt "mleap-spark/testOnly *OneHotEncoderParitySpec*"
[info] OneHotEncoderParitySpec:
[info] - has parity between Spark/MLeap
[info] - serializes/deserializes the Spark model properly
[info] - model input/output schema matches transformer UDF
[info] - serializes/deserializes the Spark model properly with one in/out column
[info] - fails to instantiate if the Spark model sets inputCol and inputCols
[info] - fails to instantiate if the Spark model sets outputCol and outputCols
[info] Run completed in 8 seconds, 315 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
The spark feature org.apache.spark.ml.feature.OneHotEncoderModel has two mixins for the input columns: inputCol and inputCols. We need to check which param is set and use that correct one to compute categorySizes.
Tests pass locally: