combust / mleap

MLeap: Deploy ML Pipelines to Production
https://combust.github.io/mleap-docs/
Apache License 2.0
1.5k stars 310 forks source link

MLeap Transformer issue. #839

Closed drei34 closed 1 year ago

drei34 commented 1 year ago

Hi, I am fitting a pipeline with pyspark=2.4.8, mleap=0.16.0, sparkxgb=0.9 . I save the pipeline to a bundle but when I try and load the bundle and apply transform I get a massive error. Is there a known issue w the above versions? I pasted some code below

[PairSimilarity_52c78d50078c, # Custom - this worked before StringIndexer_4bbede0b494a, StringIndexer_b9252681dbc5, StringIndexer_7827544454b9, StringIndexer_5cf1dee70c96, StringIndexer_9e92895dd932, StringIndexer_6bf432ea8a5f, StringIndexer_17bf58e055be, OneHotEncoderEstimator_d57c3e588c63, OneHotEncoderEstimator_16e5920ce54b, OneHotEncoderEstimator_89aa22b3d19b, OneHotEncoderEstimator_6106e28a07e5, OneHotEncoderEstimator_892b21c17186, VectorAssembler_e66a4f3c2ef5, XGBoostClassifier_d66904329156, VectorSlicer_2cc5297bfff2, IsotonicRegression_7792d4379516, Binarizer_27d9d3c46611, Binarizer_381fd6eb9eb3, BinaryOperation_3aec28784af7, VectorAssembler_33b0110f3917]

Error looks like:

`22/12/22 23:48:57 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 2.0 in stage 2.0 (TID 7, adtech-dev-ds-a0s0x1v-jxahn-hub-sw-dx99.c.wmt-customer-tech-adtech.internal, executor 1): ml.dmlc.xgboost4j.java.XGBoostError: [23:48:57] /xgboost/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:146: [23:48:57] /xgboost/jvm-packages/xgboost4j/src/native/xgboost4j.cpp:68: Check failed: jenv->ExceptionOccurred(): Stack trace: [bt] (0) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x22) [0x7f6edb3d1f82] [bt] (1) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(XGBoost4jCallbackDataIterNext+0xf89) [0x7f6edb3cfcd9] [bt] (2) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(xgboost::NativeDataIter::Next()+0x15) [0x7f6edb3def15] [bt] (3) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(xgboost::data::SimpleCSRSource::CopyFrom(dmlc::Parser<unsigned int, float>)+0x64) [0x7f6edb4195e4] [bt] (4) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(xgboost::DMatrix::Create(dmlc::Parser<unsigned int, float>, std::string const&, unsigned long)+0x363) [0x7f6edb40e0b3] [bt] (5) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(XGDMatrixCreateFromDataIter+0x134) [0x7f6edb3d46e4] [bt] (6) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x94) [0x7f6edb3cdc84] [bt] (7) [0x7f6f25017de7]

Stack trace: [bt] (0) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x22) [0x7f6edb3d1f82] [bt] (1) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(XGBoost4jCallbackDataIterNext+0x11c1) [0x7f6edb3cff11] [bt] (2) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(xgboost::NativeDataIter::Next()+0x15) [0x7f6edb3def15] [bt] (3) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(xgboost::data::SimpleCSRSource::CopyFrom(dmlc::Parser<unsigned int, float>)+0x64) [0x7f6edb4195e4] [bt] (4) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(xgboost::DMatrix::Create(dmlc::Parser<unsigned int, float>, std::string const&, unsigned long)+0x363) [0x7f6edb40e0b3] [bt] (5) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(XGDMatrixCreateFromDataIter+0x134) [0x7f6edb3d46e4] [bt] (6) /hadoop/yarn/nm-local-dir/usercache/adtech-ds/appcache/application_1671749741546_0008/container_1671749741546_0008_01_000003/tmp/libxgboost4j5589502511762047561.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGDMatrixCreateFromDataIter+0x94) [0x7f6edb3cdc84] [bt] (7) [0x7f6f25017de7]

at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
at ml.dmlc.xgboost4j.java.DMatrix.<init>(DMatrix.java:53)
at ml.dmlc.xgboost4j.scala.DMatrix.<init>(DMatrix.scala:42)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel$$anonfun$6$$anon$1$$anonfun$7.apply(XGBoostClassifier.scala:311)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel$$anonfun$6$$anon$1$$anonfun$7.apply(XGBoostClassifier.scala:293)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel$$anonfun$6$$anon$1.hasNext(XGBoostClassifier.scala:325)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:260)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

`

jsleight commented 1 year ago

There weren't any known issues in previous versions which I'm aware of, though I never personally used that exact set of libraries iirc.

The error you linked is happening inside of xgboost's c++ code though, so probably isn't an issue in mleap.

If you're only seeing errors when deserializing the bundle, you might consider swapping to the xgboost predictor library instead as detailed here https://github.com/combust/mleap/blob/master/mleap-xgboost-runtime/README.md

Upgrading spark+mleap+xgboost is obviously another option.