Open usberkeley opened 1 week ago
Yes
allowOperationMetadataField
allowed only when populateMetaFields
is enabled?1) Disabling populateMetaFields
can reduce the performance overhead of decoding HoodieRecords. However, if allowOperationMetadataField
is enabled, decoding performance is still affected even if populateMetaFields
is disabled. Therefore, the impact of these two settings on performance is interconnected.
2) Both are metadata fields. populateMetaFields
is the main switch, while allowOperationMetadataField
just controls the activation of specific metadata fields. When the main switch is off, the sub-switches should have no effect.
populateMetaFields
, why the number of record key fields must be equal to one?The Log Scanner needs to regenerate the Record Key. Currently, it only supports a simple key generator, which means there can only be one primary key column.
Change Logs
1. Fix the bug
hoodie.populate.meta.fields
in Table Config (hoodie.properties)2. Optimize write performance
Impact
Improve write performance. After optimization, the write speed with
hoodie.populate.meta.fields=false
is 42.9% faster than withhoodie.populate.meta.fields=true
.Testing method Consume from the earliest position in Kafka until all messages are consumed (Kafka Lag = 0), and compare the time taken for both.
1)populate meta fields time taken: 21hours and 25mins
2)no meta fields time taken: 12hours and 14mins
Risk level (write none, low medium or high below)
medium
Documentation Update
none
Contributor's checklist