Open zyclove opened 7 months ago
@danny0405 why is back to GLOBAL_SIMPLE?
23/12/04 14:39:29 WARN SparkMetadataTableRecordIndex: Record index not initialized so falling back to GLOBAL_SIMPLE for tagging records
hoodie.metadata.table -> hoodie.metadata.enable
@danny0405 With set hoodie.metadata.enable=true, now is RECORD_INDEX. But the follow stage is very very slow too.
SparkMetadataTableRecordIndex
fileGroupSize = hoodieTable.getMetadataTable().getNumFileGroupsForPartition(MetadataPartitionType.RECORD_INDEX); Why not 512 fileGroupSize? In addition to adjusting the number of buckets in the upstream source table, is there any other way to tune it?
Describe the problem you faced
The spark job is too slow in follow stage. Adjusting CPU, memory, and concurrency has no effect. Which stage can be optimized or skipped?
Is this normal? Why still use HoodieGlobalSimpleIndex?![image](https://github.com/apache/hudi/assets/15028279/89cb305f-bc23-40a7-ac00-0adab5933b53)
To Reproduce
Steps to reproduce the behavior:
set hoodie.write.lock.zookeeper.lock_key=bi_ods_real.smart_datapoint_report_rw_clear_rt; set hoodie.storage.layout.type=DEFAULT; set hoodie.metadata.record.index.enable=true; set hoodie.metadata.table=true; set hoodie.populate.meta.fields=false; set hoodie.parquet.compression.codec=snappy; set hoodie.memory.merge.max.size=2004857600000; set hoodie.write.buffer.limit.bytes=419430400; set hoodie.index.type=RECORD_INDEX;