apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.14k stars 415 forks source link

HdfsIOException: RemoteBlockReader does not support CRC32 checksum #3470

Open hailiang9615 opened 10 months ago

hailiang9615 commented 10 months ago

Backend

VL (Velox)

Bug description

Using Spark 3.2.3, Gluten0.5.0 to query Parquet data on HDFS, an error message from HDFS appears::HdfsIOException: RemoteBlockReader does not support CRC32 checksum

Spark version

None

Spark configurations

spark.plugins io.glutenproject.GlutenPlugin spark.gluten.sql.columnar.backend.lib velox spark.memory.offHeap.enabled true spark.memory.offHeap.size 20g spark.gluten.loadLibFromJar true spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager jars /mnt/disk1/glutenSpark/glutenJar/gluten-thirdparty-lib-centos-7.jar,/mnt/disk1/glutenSpark/glutenJar/gluten-velox-bundle-spark3.2_2.12-centos_7-0.5.0-SNAPSHOT.jar spark.driver.extraClassPath /mnt/disk1/glutenSpark/glutenJar/gluten-thirdparty-lib-centos-7.jar:/mnt/disk1/glutenSpark/glutenJar/gluten-velox-bundle-spark3.2_2.12-centos_7-0.5.0-SNAPSHOT.jar spark.executor.extraClassPath /mnt/disk1/glutenSpark/glutenJar/gluten-thirdparty-lib-centos-7.jar:/mnt/disk1/glutenSpark/glutenJar/gluten-velox-bundle-spark3.2_2.12-centos_7-0.5.0-SNAPSHOT.jar spark.executorEnv.LIBHDFS3_CONF "/mnt/disk1/glutenSpark/spark-3.2.2-bin-hadoop3.2/conf/hdfs-site.xml" files /mnt/disk1/glutenSpark/spark-3.2.2-bin-hadoop3.2/conf/hdfs-site.xml

System information

processor : 47 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz stepping : 7 microcode : 0x5003006 cpu MHz : 3000.000 cache size : 16896 KB physical id : 1 siblings : 24 core id : 13 cpu cores : 12 apicid : 59 initial apicid : 59 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear spec_ctrl intel_stibp flush_l1d arch_capabilities bogomips : 4805.69 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:

Relevant logs

Caused by
RemoteBlockReader.cpp: 149: HdfsIOException: RemoteBlockReader does not support CRC32 checksum, Block: [block pool ID: BP-1348432751-10.184.50.185-1688547040473 block ID 1073801341_60525], from Datanode: hbpcluster003.bigdata.hikvision.com(10.184.50.197)
    @   Unknown
    @   Unknown
    @   Unknown
    @   Unknown
    @   Unknown
    @   Unknown
    @   Unknown
    @   gluten::DwrfDatasource::Init(std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)
    @   gluten::DwrfDatasource::Init(std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&)
    @   facebook::velox::aggregate::prestosql::(anonymous namespace)::MaxByAggregate<signed char, double>::addSingleGroupIntermediateResults(char*, facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > > const&, bool)
    @   _ZZN8facebook5velox9aggregate12PrestoHasher4hashILNS0_8TypeKindE7EEEvRKNS0_17SelectivityVectorERN5boost13intrusive_ptrINS0_6BufferEEEENKUlT_E_clIiEEDaSD_.isra.0
    @   facebook::velox::functions::sparksql::(anonymous namespace)::LeastGreatestFunction<facebook::velox::functions::sparksql::Greater<facebook::velox::Date>, (facebook::velox::TypeKind)10>::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const
    @   facebook::velox::functions::sparksql::(anonymous namespace)::LeastGreatestFunction<facebook::velox::functions::sparksql::Greater<facebook::velox::Timestamp>, (facebook::velox::TypeKind)9>::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const
    @   facebook::velox::functions::sparksql::(anonymous namespace)::LeastGreatestFunction<facebook::velox::functions::sparksql::Greater<facebook::velox::Timestamp>, (facebook::velox::TypeKind)9>::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const
    @   facebook::velox::functions::sparksql::(anonymous namespace)::LeastGreatestFunction<facebook::velox::functions::sparksql::Less<facebook::velox::Date>, (facebook::velox::TypeKind)10>::apply(facebook::velox::SelectivityVector const&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&, std::shared_ptr<facebook::velox::Type const> const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) const
    @   gluten::VeloxBackend::toVeloxPlan()
hailiang9615 commented 10 months ago

Sorry, I didn't compile glutem-main, and gluten-1.0.0 has Executor's Netty flaw.

Yohahaha commented 10 months ago

strange call stack,

  1. max_by has not been supported, should fallback.
  2. why seek DwrfDatasource when read parquet data.
hailiang9615 commented 10 months ago

The newly compiled gluten-1.1.0 also has the same problem, is Gluten not supporting this checkSum?

hailiang9615 commented 10 months ago

Caused by: HdfsIOException: RemoteBlockReader does not support CRC32 checksum