4paradigm / OpenMLDB

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
https://openmldb.ai
Apache License 2.0
1.59k stars 320 forks source link

Failed to import data in offline mode because curator version conflict #2188

Closed uttinie closed 2 years ago

uttinie commented 2 years ago

Bug Description

22/07/18 14:13:20 INFO Compatibility: Running in ZooKeeper 3.4.x compatibility mode 22/07/18 14:13:20 INFO Compatibility: Using emulated InjectSessionExpiration Exception in thread "main" java.lang.NoSuchMethodError: org.apache.curator.CuratorZookeeperClient.<init>(Lorg/apache/curator/utils/ZookeeperFactory;Lorg/apache/curator/ensemble/EnsembleProvider;IIILorg/apache/zookeeper/Watcher;Lorg/apache/curator/RetryPolicy;ZLorg/apache/curator/connection/ConnectionHandlingPolicy;)V at org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:131) at org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:165) at com._4paradigm.openmldb.common.zk.ZKClient.connect(ZKClient.java:54) at com._4paradigm.openmldb.batch.catalog.OpenmldbCatalogService.<init>(OpenmldbCatalogService.scala:37) at com._4paradigm.openmldb.batch.api.OpenmldbSession.<init>(OpenmldbSession.scala:68) at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:180) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:972) at com._4paradigm.openmldb.batchjob.ImportOfflineData$.importOfflineData(ImportOfflineData.scala:34) at com._4paradigm.openmldb.batchjob.ImportOfflineData$.main(ImportOfflineData.scala:30) at com._4paradigm.openmldb.batchjob.ImportOfflineData.main(ImportOfflineData.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Expected Behavior

zookeeper 3.4.14 curator 4.2.0

In offline mode , fail to load file by using 'LOAD DATA INFILE '/work/taxi-trip/data/taxi_tour_table_train_simple.snappy.parquet' INTO TABLE t1 options(format='parquet', header=true, mode='append');'

The official explanation of curator is as follows:https://curator.apache.org/zk-compatibility-34.html

The openmldb example does not add zookeeper exclusions, how do we pass it ? Steps to Reproduce

aceforeverd commented 2 years ago

openmldb do not include curator, what is the reproduce steps ?

@vagetablechicken @tobegit3hub any idea about curator ?

uttinie commented 2 years ago

openmldb do not include curator, what is the reproduce steps ?

@vagetablechicken @tobegit3hub any idea about curator ?

Deploy cluster version according to https://openmldb.ai/docs/en/main/deploy/install_deploy.html

Repeat the official example steps:https://openmldb.ai/docs/en/main/use_case/lightgbm_demo.html

This line in the stack of openmldb-taskmanager-0.5.2/taskmanager/bin/logs/job_15_error.log prints the error and shows that it comes from openmldb

com._4paradigm.openmldb.common.zk.ZKClient.connect(ZKClient.java:54)

Full error as you can see in the Bug Description

version:0.5.2, physical machine deployment instead of docker testing

vagetablechicken commented 2 years ago

openmldb do not include curator, what is the reproduce steps ?

@vagetablechicken @tobegit3hub any idea about curator ?

spark.home jars include this, openmldb-batch has class files about curator. spark jars curator-framework .. 3 pkgs are needless.

uttinie commented 2 years ago

4pdosc/openmldb:0.5.2

job_1_error.log

image

image

uttinie commented 2 years ago

root@570c55c47cc5:/work/openmldb/spark-3.0.0-bin-openmldbspark/jars# ls *.jar | xargs md5sum 9e9af6183a8ad44b01b4b76a34a56821 JLargeArrays-1.5.jar 2d8ae6ebfdefa987a7edfa718e8f7815 JTransforms-3.1.jar 27d1d944c1f540e8771b9eb9aead1efb RoaringBitmap-0.7.45.jar 46a37512971d8eca81c3fcf245bf07d2 activation-1.1.1.jar f7530afc9741d3594cb9f86a2ab875c2 aircompressor-0.10.jar da22bde6d9f36850719375f788c8c3dc algebra_2.12-2.0.0-M2.jar 0223e36b3a3fadd05a52221828a4fcf1 antlr4-runtime-4.7.1.jar 04177054e180d09e3998808efa0401c7 aopalliance-1.0.jar 0237846ebdaa7db36b356044a373ffba aopalliance-repackaged-2.6.1.jar f5877c02fd56ade67713560e589c81b9 apacheds-i18n-2.0.0-M15.jar 3118e22eac44e150c383df1d417772f4 apacheds-kerberos-codec-2.0.0-M15.jar cf4561832dab76e9f37461342ec18d17 api-asn1-api-1.0.0-M20.jar 2c5a6722666882024becdd64301be492 api-util-1.0.0-M20.jar 83d82dd480da2aeba6429e746453ec0b arpack_combined_all-0.1.jar f86549acb117a858d1adfaf7d6fc0e31 arrow-format-0.15.1.jar f44c7868c21731d375346d1c08f43fb6 arrow-memory-0.15.1.jar 1ca79cb56551375d9aebafc26cfea00c arrow-vector-0.15.1.jar 032788f0841d26b027957fe91f2cd696 audience-annotations-0.5.0.jar 10395e5a571e1a1f6113411f276d2fea avro-1.8.2.jar d5068bf37b2a4072497bb1203522d104 avro-ipc-1.8.2.jar 1cfe4e66985b9d12a19255bb289719e6 avro-mapred-1.8.2-hadoop2.jar d69b730b576483407ba7081968af4fbb breeze-macros_2.12-1.0.jar 12f32c9676c3a21a4294ea4dd2112b17 breeze_2.12-1.0.jar 5fb86d3f1845bc5ba892db28ae65c266 cats-kernel_2.12-2.0.0-M4.jar be8318f5cfb32b5a0892e54a21ff2ac4 chill-java-0.9.5.jar 8892fe3db9f2455183bdc610f40d5f0a chill_2.12-0.9.5.jar 07dc532ee316fe1f2f0323e9bd2f8df4 commons-beanutils-1.9.4.jar bfdcae1ff93f0c07d733f03bdce28c9e commons-cli-1.2.jar 353cf6a2bdba09595ccfa073b78c7fcb commons-codec-1.10.jar f54a8510f834a1a57166970bfc982e94 commons-collections-3.2.2.jar 1259371bedcac8b367cb748812ee153a commons-compiler-3.0.16.jar d862e30ff6b5d78264677dcd6507abb8 commons-compress-1.8.1.jar b099d9f9b4b99071cc52b259308df69a commons-configuration-1.6.jar 981c95e38457b10d429090496b96f2d6 commons-crypto-1.0.0.jar cf89c593f0378e9509a06fce7030aeba commons-digester-1.8.jar 8ad8c9229ef2d59ab9f59f7050e846a5 commons-httpclient-3.1.jar 7f97854dc04c119d461fed14f5d8bb96 commons-io-2.4.jar 4d5c1693079575b362edf41500630bbd commons-lang-2.6.jar fa752c3cb5474b05e14bf2ed7e242020 commons-lang3-3.9.jar 14a218d0ee57907dd2c7ef944b6c0afd commons-math3-3.4.1.jar 23c94d51e72f341fb412d6a015e16313 commons-net-3.1.jar a1fb840c3963ed43c78291b5e61d55ac commons-text-1.6.jar dc55ed6fe0bbad93bbf38331768ba1b4 compress-lzf-1.0.3.jar ab845840ad73fa2ec1a5025a7c48b97e core-1.1.2.jar 3b43933c18d1dcf15f88db73ee646396 curator-client-2.7.1.jar 35bff30d2a79a8b0731269604b1327ee curator-framework-2.7.1.jar 156ad30fb9995b072175ae60fbb352a5 curator-recipes-2.7.1.jar 7e3311ef466642fe47a5b203bc7e5d21 flatbuffers-java-1.9.0.jar 2f54fc24807a4cad7297012dd8cebf3d gson-2.2.4.jar 58553f87d83b9f8ec74bd3529083ee2f guava-14.0.1.jar ca1c7ba366884cfcd2cfb48d2395c400 guice-3.0.jar c9f66a5f6a0d840d9057b30853f25b85 guice-servlet-3.0.jar 6fe58898886aebb11e761f75bdc3f237 hadoop-annotations-2.7.4.jar 13dc9913ede3dfc6d95f3a7c5dffd659 hadoop-auth-2.7.4.jar 16b165f9f612e3670362cd2c81880d17 hadoop-client-2.7.4.jar ac17600d1fb51ada7fd2e677ce708005 hadoop-common-2.7.4.jar e18f429b60662b724cad080b834717a3 hadoop-hdfs-2.7.4.jar 15a8280f7ef0e899619e1327432bd2a0 hadoop-mapreduce-client-app-2.7.4.jar 4e693e98da332ce3bc5454bd5d5181ca hadoop-mapreduce-client-common-2.7.4.jar 7bf3a032acb82ce47d9708c18e32c40d hadoop-mapreduce-client-core-2.7.4.jar a14b2c627a143967063eea7b5e661f47 hadoop-mapreduce-client-jobclient-2.7.4.jar e196cced9f2bd3d84a2b784b3d875938 hadoop-mapreduce-client-shuffle-2.7.4.jar f76ab1def6d8891d7c1afc3ab21029f7 hadoop-yarn-api-2.7.4.jar 368dc220f6a89d8c1fc2ecdd1cc1a1fd hadoop-yarn-client-2.7.4.jar df580b6251cfe03488c7eeeaaf3c09a3 hadoop-yarn-common-2.7.4.jar 5026f93767c09db7e41894012343243c hadoop-yarn-server-common-2.7.4.jar 13ee93f1e496fb03fb3044beb919c478 hadoop-yarn-server-web-proxy-2.7.4.jar 65f5bc221acc58325182e2a7e75cdd9e hive-storage-api-2.7.1.jar 23e8c18dae0c7b776bed756763d5153f hk2-api-2.6.1.jar dfd358720393d83b01747928db6e3912 hk2-locator-2.6.1.jar 75ccb55538a77bf878996497ffeb86f3 hk2-utils-2.6.1.jar c49a4662d691a09eed10e0a35dd73299 htrace-core-3.1.0-incubating.jar 877aca56579fea38c6358d06408976ba httpclient-4.5.6.jar c152f231bf2570eca354c49ef8756b41 httpcore-4.4.12.jar d8555a2f242c55d6727b4d0e82ab8446 istack-commons-runtime-3.0.8.jar 8c88b943fcd643d5e592b86179c6fbeb ivy-2.4.0.jar 6f7312c46c6c9767b11c4aa192331510 jackson-annotations-2.10.0.jar b109d8d9d0519111d5756389fa5bfd87 jackson-core-2.10.0.jar 319c49a4304e3fa9fe3cd8dcfc009d37 jackson-core-asl-1.9.13.jar 195bfa368ad502b05427d9fb0346735d jackson-databind-2.10.0.jar 8481e1904d9bfe974157a6af04b4445e jackson-jaxrs-1.9.13.jar 1750f9c339352fc4b728d61b57171613 jackson-mapper-asl-1.9.13.jar e3076d5b57027a2ff197335bd3d743d4 jackson-module-paranamer-2.10.0.jar bc0cf7caf8f0ed7d1cf2ae83a6dec46a jackson-module-scala_2.12-2.10.0.jar 49f6a735bae30745dcf5ecec27090720 jackson-xc-1.9.13.jar 8b165cf58df5f8c2a222f637c0a07c97 jakarta.annotation-api-1.3.5.jar 4d7c80a1e3cd54531af03bef4537f7af jakarta.inject-2.6.1.jar 77501d529c1928c9bac2500cc9f93fb0 jakarta.validation-api-2.0.2.jar c3892382aeb5c54085b22b1890511d29 jakarta.ws.rs-api-2.1.6.jar dabb40ba58199304c640b7bd8bb2fbac jakarta.xml.bind-api-2.3.2.jar 9f6fdb647f71e5cbe75abb7e48935f1f janino-3.0.16.jar 3a4267e01989478be188d127b7a39425 javassist-3.25.0-GA.jar 289075e48b909e9e74e6c915b3631d2e javax.inject-1.jar 79de69e9f5ed8c7fcb8342585732bbf7 javax.servlet-api-3.1.0.jar a415e9a322984be1e1f8a023d09dca5f jaxb-api-2.2.2.jar 9c3bf13a58e56c1b955bf5a365ca10b2 jaxb-runtime-2.3.2.jar 69ad224b2feb6f86554fe8997b9c3d4b jcl-over-slf4j-1.7.30.jar be259b786fd911b6c0340981752d1a13 jersey-client-2.30.jar 42daf3f78a45e21d5d676106355411be jersey-common-2.30.jar 23046e98517a2287c4b5965cde68628f jersey-container-servlet-2.30.jar e65386a26fd98807fad9e2c2952db1a8 jersey-container-servlet-core-2.30.jar 956c339306e51d8ee20e5896138781a3 jersey-hk2-2.30.jar 98f3487bff43dc8ffb0e0f81bc670bfc jersey-media-jaxb-2.30.jar 38c8fff7964dce84e050ff90e49a0d85 jersey-server-2.30.jar 12b65438bbaf225102d0396c21236052 jetty-6.1.26.jar d3bea45d6939e57fccf450a914fe4e1a jetty-sslengine-6.1.26.jar 450fedce4f7f8ad3761577b10a664200 jetty-util-6.1.26.jar 90444d099fc95ec327de5b102db330ca json4s-ast_2.12-3.6.6.jar 873e590373cd6826c1c637c9d428261b json4s-core_2.12-3.6.6.jar a422fe637ff0ba559523c18a8fcb4863 json4s-jackson_2.12-3.6.6.jar 899d2b035bb236359ec51cde75a261f5 json4s-scalap_2.12-3.6.6.jar b8a34113a3a1ce29c8c60d7141f5a704 jsp-api-2.1.jar 195d5db8981fbec5fa18d5df9fad95ed jsr305-3.0.0.jar f2c78cb93d70dc5dea0c50f36ace09c1 jul-to-slf4j-1.7.30.jar 27717b481916c44eed34ea7a68782ed5 kryo-shaded-4.0.2.jar 6944e9bc03c7938868e53c96726ae914 leveldbjni-all-1.8.jar 04a41f0a068986f0f73485cf507c0f40 log4j-1.2.17.jar d56d86823662a663a4d614dd5e117eff lz4-java-1.7.1.jar a1797e977a9e0356e479446b210906ac machinist_2.12-0.6.8.jar c6c8927e9d6b7e3e4f60c019f146d099 macro-compat_2.12-1.1.1.jar be92eb2325787de133822d1fd447795a metrics-core-4.1.1.jar cbc0dd9a7cebdddd302abd8248fcf10f metrics-graphite-4.1.1.jar 1bf9b20b456739f5d153db27488b6abd metrics-jmx-4.1.1.jar 80ff97aaef7e0cdc49572bd84e7b71fb metrics-json-4.1.1.jar b0b33660b9b704fe998b9c97bf2b15eb metrics-jvm-4.1.1.jar 5ab0ee168b90e0ad7010b159e603d304 minlog-1.3.0.jar 51b8ee99538474a3c43aaa0dafbe1183 netty-all-4.1.47.Final.jar 84b9e3191629e53abbb05a92c683c617 objenesis-2.5.1.jar 9eebabaa007dc329845e5ab3c12b4e6b opencsv-2.3.jar 858958e32afbdb2066335c5c7fef4167 openmldb-batch-0.5.2.jar 3c65baec4c73c9d4210b02cf7353d3fc openmldb-native-0.5.2-allinone.jar e38fa04f0a764afad8311c66be036ff9 orc-core-1.5.10.jar 86902e60ff029fbb3bdb0c399e1d8c79 orc-mapreduce-1.5.10.jar 5f77d51b3079ae13e18ea6167372ab6d orc-shims-1.5.10.jar 42e940d5d2d822f4dc04c65053e630ab oro-2.0.8.jar e7e82b82118c5387ae45f7bf3892909b osgi-resource-locator-1.0.3.jar f213c72b67d4850f17a4a3e9064904de paranamer-2.8.jar 1f9dd05a9c588c54bd6fb7512de28240 parquet-column-1.10.1.jar 150a1dd63e6ecc2773313b5b874739c8 parquet-common-1.10.1.jar abe8be70da3436d72d97595470ec7d48 parquet-encoding-1.10.1.jar 694f51066294bd941a3f24fe870ec9f6 parquet-format-2.4.0.jar 9836550a739f2448169300e07489261b parquet-hadoop-1.10.1.jar 1d83df16a9306173069f2a36a99bfbfd parquet-jackson-1.10.1.jar c4ceefed77d79affded2a1302e74606d protobuf-java-3.11.4.jar dd330c65ed0c331ec028ab38eeaaae1f py4j-0.10.9.jar 259e0cd3de5f46b4b61cd2e9d08fb385 pyrolite-4.30.jar d10083faaf87b115f5875c153ea52740 scala-collection-compat_2.12-2.1.1.jar f8972c9a2919830dc47ffd26094c4882 scala-compiler-2.12.10.jar 9fcf8259fb239c6f2b148963cac03af2 scala-library-2.12.10.jar b687df70f489bc911396c996a864f13b scala-parser-combinators_2.12-1.1.2.jar 034e8750598cc34716f98fe53bae4457 scala-reflect-2.12.10.jar 5daf691f15978092fc8424e1fe5245e4 scala-xml_2.12-1.2.0.jar 002995e4aa53f59d06e99734826cc960 shapeless_2.12-2.3.3.jar 3b98287c4745f90a9dda7aa77e4405f1 shims-0.7.45.jar f8be00da99bc4ab64c79ab1e2be7cb7c slf4j-api-1.7.30.jar 78f1ff83b38c52a30a278dec6e023a6d slf4j-log4j12-1.7.30.jar c887c852a1ccdecb7135d805d604e6f0 snappy-java-1.1.7.5.jar 1fcc9d0e96b4389ca5f9c48a86f9ca2b spark-catalyst_2.12-3.0.0.jar 1b6b18d1398e864af86e71ddeea33348 spark-core_2.12-3.0.0.jar d55f1e5b190a08eebe191274e40dce8b spark-graphx_2.12-3.0.0.jar cb85a3bdb47e460bdeea8b667adee6f2 spark-kvstore_2.12-3.0.0.jar f348e217f0e1c004b4ea86f52cd1f226 spark-launcher_2.12-3.0.0.jar 886034b1d1ece1b56e14867c1f279076 spark-mllib-local_2.12-3.0.0.jar e9bd26a2d6a7b3b1417f628a79ab9e20 spark-mllib_2.12-3.0.0.jar 446b4e2813f671ed803feaaa6eb8175c spark-network-common_2.12-3.0.0.jar c70e895dd28b551f243d9a8b14a3cca1 spark-network-shuffle_2.12-3.0.0.jar eaf8a008d9014792b26163b564ea0793 spark-repl_2.12-3.0.0.jar 419ff61668a074223c3143c1be0fbe00 spark-sketch_2.12-3.0.0.jar c7cb5ec0a2c2e6bd5d21f0fa30fe53c5 spark-sql_2.12-3.0.0.jar d5768c865b1249bfb8e0adde34ce5f90 spark-streaming_2.12-3.0.0.jar 057df387985ddea7a6fd962bcbf52883 spark-tags_2.12-3.0.0.jar 8c09f5e4ef3c63eed5c3f64ef8a98f60 spark-unsafe_2.12-3.0.0.jar 97d2543e37e38e6307f817a08ec6be4b spark-yarn_2.12-3.0.0.jar 4a24f85bbd05d7a9cf9c708a7926f564 spire-macros_2.12-0.17.0-M1.jar 7aaff30f900870cfe6c010fe2562caf4 spire-platform_2.12-0.17.0-M1.jar 3034b6182c2ae3d79d61e129d767308c spire-util_2.12-0.17.0-M1.jar b4a8aaa12d2f14ebe958b6c603b55a8e spire_2.12-0.17.0-M1.jar 7d18b63063580284c3f5734081fdc99f stax-api-1.0-2.jar 298278ea2ea29ce1da1a956dd6fad4d4 stream-2.9.6.jar 25fcd93381bd0b0d2cf6b99c231e4bb4 threeten-extra-1.5.0.jar 7f2618691b423c92776e9ccfb8e3c07d univocity-parsers-2.8.3.jar a4faf95f82df51512cd84b5dd08ee725 xbean-asm7-shaded-4.15.jar b89632b53c4939a2982bcb52806f6dec xercesImpl-2.12.0.jar 7eaad6fea5925cca6c36ee8b3e02ac9d xml-apis-1.4.01.jar c962b6bc3c8de46795b0ed94851fa9c7 xmlenc-0.52.jar 51050e595b308c4aec8ac314f66e18bc xz-1.5.jar 3b94b28886b82013c94383ef89af6061 zookeeper-3.4.14.jar af758a06fbc7fc34e1927bff3eaaab99 zstd-jni-1.4.4-3.jar

vagetablechicken commented 2 years ago

openmldb-batch has all class, but curator-xx jars contains them too. The class conflicts happens by chance, not always. But we should delete the curator-xx jars in spark release package. @tobegit3hub

tobegit3hub commented 2 years ago

The root cause is that Spark uses curator 2.7.1 for Hadoop 2.7 by default and OpenMLDB uses curator 4.2.0 which are not compatible.

Here is the patch to fix that issue. Then we will package Spark with curator 4.2.0.

diff --git a/pom.xml b/pom.xml
index 183d258840..ebc481b1fb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3288,7 +3288,8 @@
       <id>hadoop-2.7</id>
       <properties>
         <hadoop.version>2.7.4</hadoop.version>
-        <curator.version>2.7.1</curator.version>
+        <!-- <curator.version>2.7.1</curator.version> -->
+        <curator.version>4.2.0</curator.version>
         <commons-io.version>2.4</commons-io.version>
         <!--
           the declaration site above of these variables explains why we need to re-assign them here

The issue will be resolved by https://github.com/4paradigm/spark/pull/27 .

aceforeverd commented 2 years ago

@uttinie can u verify if problem has resolved with latest OpenMLDB version v0.6.4 :)

tobegit3hub commented 2 years ago

I have tested OpenMLDB 0.6.4 which is not resolved yet. This will be fixed by https://github.com/4paradigm/spark/pull/34 .

tobegit3hub commented 2 years ago

This is resolve we upgrading to OpenMLDB 0.6.5.

If we are using the lower version, we need to delete the curator packages(curator-client-2.7.1.jar, curator-framework-2.7.1.jar, curator-recipes-2.7.1.jar) in $SPARK_HOME/jars/. Refer to https://github.com/Netflix/curator/issues/297 , we need to delete the zookeeper-3.4.14.jar and download zookeeper-3.4.14.jar from https://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.6.2/zookeeper-3.6.2.jar .