apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.51k stars 3.53k forks source link

[C++][Parquet] Proof-of-concept: Trying to using FlatBuffer as Parquet Footer #43695

Open mapleFU opened 2 months ago

mapleFU commented 2 months ago

Describe the enhancement requested

Background: Parquet Metadata evolution

Should we just do a POC for this one?

Component(s)

C++, Parquet

mapleFU commented 2 months ago

@alkis Do we have some POC or code with Scrub? I think we may first do a POC with some Metadata API here?

alkis commented 2 months ago

I have one but I need to port it to arrow style.

mapleFU commented 2 months ago

Great! Thanks for the effort!

I would also glad to help with metadata api in current parquet C++, since it's also a bit weird. If any problem on adapting, just ping me

alkis commented 2 months ago

My prototype is focused on converting thrift to flatbuf while preserving most of the semantics. I haven't looked at the metadata API in detail, I assume that will remain largely the same.

mapleFU commented 2 months ago

This is appreciated but I think the problem is currently the metadata highly relies or thrift impl 🤔. Glad to see the patch!

alkis commented 2 months ago

I am trying to add the benchmark under src/parquet/metadata3_benchmark.cc but I have trouble with cmake. How do I write the rule such that it links absl in?

I added this to src/parquet/CMakeLists.txt:

add_parquet_benchmark(metadata3_benchmark)

But the absl headers are not found.

mapleFU commented 2 months ago

Seems flight-rpc includes absl, maybe target_link... for absl would work

Also cc @kou for help

kou commented 2 months ago

How about add_parquet_benchmark(metadata3_benchmark EXTRA_LINK_LIBS absl::XXX)?

kou commented 2 months ago

BTW, can we link https://github.com/apache/arrow/pull/43793 to this?

alkis commented 2 months ago

BTW, can we link https://github.com/apache/arrow/pull/43793 to this?

Done.

alkis commented 2 months ago

How about add_parquet_benchmark(metadata3_benchmark EXTRA_LINK_LIBS absl::XXX)?

I think I tried that and it says it can't find absl::strings. I implemented the few things I needed by hand so its ok now :-)