Open mapleFU opened 1 year ago
@pitrou mind take a look?
Eek. I opened https://issues.apache.org/jira/browse/PARQUET-2221 for the defective spec.
It's not obvious to me that this is used for dict fallback. Looking at the dict_fallback()
function I don't see any place where the encoder is switched to a non-dict encoder. @sunchao
Maybe the code is
fn try_new(descr: &ColumnDescPtr, props: &WriterProperties) -> Result<Self> {
let dict_supported = props.dictionary_enabled(descr.path())
&& has_dictionary_support(T::get_physical_type(), props);
let dict_encoder = dict_supported.then(|| DictEncoder::new(descr.clone()));
// Set either main encoder or fallback encoder.
let encoder = get_encoder(
props
.encoding(descr.path())
.unwrap_or_else(|| fallback_encoding(T::get_physical_type(), props)),
)?;
let statistics_enabled = props.statistics_enabled(descr.path());
let bloom_filter = props
.bloom_filter_properties(descr.path())
.map(|props| Sbbf::new_with_ndv_fpp(props.ndv, props.fpp))
.transpose()?;
Ok(Self {
encoder,
dict_encoder,
descr: descr.clone(),
num_values: 0,
statistics_enabled,
bloom_filter,
min_value: None,
max_value: None,
})
}
Seems that it creates a encoder
when build the writer
Describe the enhancement requested
In our standard ( https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8 ) , the dict fallback will goes to PLAIN encoding.
But in many implementions, it says that Parquet 2.0 should support fallback to other types, in our code, there is some todo:
And, in parquet-mr and arrow-rs, they both support fallback to other types:
So, should we support that?
Component(s)
C++, Parquet