Open xingnailu opened 9 months ago
@Yohahaha Do you know what happened?
I thought this feature may be broken in 1.1.0. Please check our experiments mentioned in https://github.com/oap-project/gluten/pull/4400 and https://github.com/oap-project/gluten/pull/4407. BTW, I think this feature now is not production ready. Hope this help~
@xingnailu thanks for reporting, could you please have a try with the main branch? as @zhli1142015 mentioned, it's not fully tested and should be broken in 1.1 release
thanks, -yuan
Backend
VL (Velox)
Bug description
I am using Glue + Velox, and the data is stored on S3. However, I found that a large portion of the task running time is spent on reading remote data. After discovering the local cache feature, I intend to enable it to verify its effectiveness. However, I encountered the following error:
C [libvelox.so+0x602f34e] facebook::velox::memory::MemoryAllocator::allocateNonContiguous(unsigned long, facebook::velox::memory::Allocation&, std::function<void (long, bool)>, unsigned long)+0x3e
I found the track log below, but it doesn't seem to show specific errors, it just runs to this line of code.Spark version
None
Spark configurations
spark submit config:
$SPARK_HOME/bin/spark-submit \ --master $K8S_MASTER \ --deploy-mode cluster \ --name spark-tpch-gluten-1g-$i \ --conf spark.executor.instances=2 \ --conf spark.executor.memory=10G \ --conf spark.executor.cores=6 \ --conf spark.driver.cores=4 \ --conf spark.driver.memory=4g \ --conf spark.driver.maxResultSize=1g \ --conf spark.eventLog.enabled=true \ --conf spark.eventLog.dir=$EVENTLOG_DIR \ --conf spark.hadoop.fs.s3a.access.key= --conf spark.hadoop.fs.s3a.secret.key= --conf spark.hadoop.fs.s3a.endpoint= --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \ --conf spark.hadoop.fs.s3a.use.instance.credentials=false \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.namespace=$NAMESPACE \ --conf spark.kubernetes.container.image=$IMAGE \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.driver.podTemplateFile=$DRIVER_TMPL \ --conf spark.kubernetes.executor.podTemplateFile=$EXEXUTOR_TMPL \ --conf spark.kubernetes.driver.name=spark-tpch-gluten \ --conf spark.sql.files.maxPartitionBytes=128m \ --conf spark.sql.shuffle.partitions=200 \ --conf spark.default.parallelism=200 \ --conf spark.sql.adaptive.enabled=true \ --conf spark.plugins=io.glutenproject.GlutenPlugin \ --conf spark.gluten.loadLibFromJar=true \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=6g \ --conf spark.executor.memoryOverhead=4g \ --conf spark.gluten.sql.debug=true \ --conf spark.gluten.sql.injectNativePlanStringToExplain=true \ --conf spark.gluten.ui.enabled=true \ --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \ --conf spark.gluten.sql.columnar.backend.velox.glogSeverityLevel=0 \ --conf spark.shuffle.service.enabled=false \ --conf spark.sql.adaptive.localShuffleReader.enabled=false \ --conf spark.gluten.sql.columnar.backend.velox.cacheEnabled=true \ --conf spark.gluten.sql.columnar.backend.velox.ssdCacheSize=0 \ --conf spark.gluten.sql.columnar.backend.velox.memCacheSize=1073741824 \
spark UI info: Backend Velox Backend Branch HEAD Backend Revision 8d12a9b36d4f754c3194aa8612f62c0d5395d541 Backend Revision Time 2024-01-11 16:26:31 +0800 GCC Version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Gluten Branch fix-finalized-s3 Gluten Build Time 2024-01-11T08:45:18Z Gluten Repo URL http://code.oppoer.me/bdc/spark/gluten.git Gluten Revision c03dfeda16e3784e1afc87fab8965c5cf0d618a8 Gluten Revision Time 2024-01-11 16:41:35 +0800 Gluten Version 1.1.0-SNAPSHOT Hadoop Version 2.7.4 Java Version 11.0.21 Scala Version 2.12.15 Spark Version 3.4.1
System information
Velox System Info v0.0.2 Commit: 58d80a1abcaeab5f174e40b4f74fe6b209700cad CMake Version: 3.16.3 System: Arch: C++ Compiler: C++ Compiler Version: C Compiler: C Compiler Version: CMake Prefix Path:
Relevant logs