dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.3k stars 8.73k forks source link

[jvm-packages] XGBoost corrupted the jvm memroy #8290

Open jinmfeng opened 2 years ago

jinmfeng commented 2 years ago

We run 14 xgboost in spark in parallel, but it throw out below error. it looks like the heap memory is coruppted. Do you know what's the issue? when we run single training model, there's no such error.

#

A fatal error has been detected by the Java Runtime Environment:

#

SIGSEGV (0xb) at pc=0x00007f1b028dab21, pid=23988, tid=0x00007f1acf826700

#

JRE version: OpenJDK Runtime Environment (Zulu 8.35.0.6-SA-linux64) (8.0_202-b04) (build 1.8.0_202-b04)

Java VM: OpenJDK 64-Bit Server VM (25.202-b04 mixed mode linux-amd64 compressed oops)

Problematic frame:

J 8901 C2 java.util.WeakHashMap.get(Ljava/lang/Object;)Ljava/lang/Object; (77 bytes) @ 0x00007f1b028dab21 [0x00007f1b028daa40+0xe1]

#

Core dump written. Default location: /hadoop/1/yarn/local/usercache/jinmfeng/appcache/application_1663728370843_647527/container_e3798_1663728370843_647527_01_000053/core or core.23988

#

An error report file with more information is saved as:

/hadoop/1/yarn/local/usercache/jinmfeng/appcache/application_1663728370843_647527/container_e3798_1663728370843_647527_01_000053/hs_err_pid23988.log

#

If you would like to submit a bug report, please visit:

http://www.azulsystems.com/support/

#

LarrySunmk commented 3 days ago

have you solved this issue?

wbo4958 commented 2 days ago

Hi @LarrySunmk, have you encountered the same issue? if yes, please comment it with details. Recently, xgboost jvm has been refactored, please use the latest xgboost snapshot if possible from https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html?prefix=snapshot/ml/dmlc/xgboost4j-spark_2.12/2.2.0-SNAPSHOT/

LarrySunmk commented 2 days ago

您的邮件我已收到! This is an automatic reply, confirming that your e-mail was received.Thank you