apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.65k stars 3.55k forks source link

[C++] IPC: Do we have IPC-File/Stream-IPC-File inspector like `example/parquet-read-write`? #44040

Open mapleFU opened 2 months ago

mapleFU commented 2 months ago

Describe the usage question you have. Please include as many useful details as possible.

I'm currently experimenting the arrow-ipc file. When I want to analysis the file, I would like to analysis the file metadata and size related info. Do we have the tools like this?

Component(s)

C++

mapleFU commented 2 months ago

@kou Just wonder if we have some tools to analysis a file, if not I can try to add one

kou commented 2 months ago

We have only arrow-file-to-stream and arrow-stream-to-file: https://github.com/apache/arrow/blob/b6316c091f416967c5e7c9a9284601fa4507aa72/cpp/src/arrow/ipc/CMakeLists.txt#L56-L60

We don't have inspector...

If they exists, it'll be helpful.

(I'm using ruby -r arrow -e 'pp Arrow::Table.load(ARGV[0])' xxx.arrow or something.)

mapleFU commented 2 months ago

Sigh, I'd like to add one but I found the arrow-ipc code a bit huge in a file...Let me try it after my vocation..

zeroshade commented 2 months ago

@bkietz put together a utility when working on the ipc writing for nanoarrow. Do you think you could share it here or a link to it? Maybe even add it to the repo?

bkietz commented 2 months ago

For now, here's a gist https://gist.github.com/bkietz/6678297afc8238826c8345ec723aade2

bkietz commented 2 months ago

It needs some cleanup, but I'll open a draft PR to add archery annotate

mapleFU commented 2 months ago

Very nice tool!