apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.6k stars 793 forks source link

Parquet Modular Encryption support #3511

Open tibaes opened 1 year ago

tibaes commented 1 year ago

Which part is this question about Documentation

Describe your question Is Parquet Modular Encryption supported by this library?

Additional context I have found some mentions to AES and encryption here and there on the documentations and code base, however there is no example of it. I am strugling to make it work, so I'm starting to think this is not fully supported yet.

tustvold commented 1 year ago

We do not currently support this, but would welcome contributions to add support for it.

bhoberman commented 10 months ago

@tustvold I'm interested in doing implementation work for this. I'd love to have a dedicated chat about it with a maintainer or community member who has context for this issue and could get me involved with contributor discussion spaces!

tustvold commented 10 months ago

I'm afraid I don't really have any context on this, as it isn't a part of the standard I am familiar with. Implementing this will likely involve interpreting the spec at https://github.com/apache/parquet-format and applying it to the Rust reader. If this is anything like other aspects of parquet, this will also involve a fair amount of spelunking in existing implementations to clarify ambiguity.

The actual encryption part can probably use something like https://docs.rs/ring/latest/ring/ as an optional dependency, but I'm just guessing here that the encryption is something standard.

I'm sorry I can't be of more help, I'd love to see this implemented and am happy to help review code contributions, but I don't really have the bandwidth at the moment to actively help with the actual implementation effort.

bhoberman commented 10 months ago

Thanks for the quick response! Parquet encryption uses two extremely standard primitives (which ring has perfectly fine implementations of). In principle, the encryption step is a very simple post-processing step, but I definitely anticipate the existing implementations having some weird quirks.

Given your resources, I'll just try to roll an implementation and submit it for review.

tustvold commented 10 months ago

Thank you, I'm happy to review code, especially if it is well tested

ggershinsky commented 8 months ago

Hi @bhoberman , I've started looking into building a Rust implementation of PME, but fortunately have found this thread quickly. How this work is going? I'd be glad to provide any help (review, advice on PME design, code contributions, etc) as needed, feel free to ping me on [sent before edit]

bhoberman commented 8 months ago

Hey @ggershinsky thanks for reaching out! I'll contact you privately with more details.

TL;DR for those using this thread as a status indicator: this was going to be a work project for me, and we decided after the research phase that it made the most sense to bind to Arrow C++ for our use-case and staffing. That said, I'd love to contribute some personal time to this project should @ggershinsky or someone else be willing to drive it.

bhoberman commented 7 months ago

Hey @tustvold, @ggershinsky and I have met and are starting on an implementation of this together. Would it be possible for us to get invites to the Apache slack (as mentioned in the README) for easier coordination than email/GitHub?

tustvold commented 7 months ago

Sure, if you join the discord you can then DM me your email addresses

matthewgapp commented 3 months ago

hey @tustvold and @bhoberman did you end up connecting or make any progress on the rust implementation? I checked the discord but didn't see any messages around encryption there. We're working on something that would depend on this and would love to help contribute if there's something already partially implemented.

adamreeve commented 3 months ago

We (@G-Research) would also like to see Parquet encryption support added and can contribute to this effort too, maybe we can work together on this @matthewgapp?

ggershinsky commented 3 months ago

Hi @matthewgapp @adamreeve . Ben (@bhoberman) and I have worked on this for a while, but had to switch to other projects. Feel free to use the early draft (branch and an internal patch) any way you like. As always, I'll be glad to help with PME design questions, etc, you can reach me on asf slack and on github.

adamreeve commented 3 months ago

@matthewgapp, are you on the Arrow Rust discord? I'm adamrnz there if you want to discuss this further. It looks like I could also invite you to the ASF Slack workspace if that's easier

matthewgapp commented 3 months ago

hey @adamreeve and @ggershinsky apologies for the delayed response here. Adam, I'll message you on discord. Happy to do slack if that's better

matthewgapp commented 3 months ago

thanks @ggershinsky for the draft and patch! will let you know if questions

rok commented 1 month ago

Hi all. I'm @adamreeve's colleague and I happen to have time available to do some work on this. Could I help with any potential open tasks @matthewgapp or is it better I pick up the @ggershinsky's draft branch?

alamb commented 1 month ago

Hi all. I'm @adamreeve's colleague and I happen to have time available to do some work on this. Could I help with any potential open tasks @matthewgapp or is it better I pick up the @ggershinsky's draft branch?

❤️

I believe the Apache DataFusion Comet project may be interested in this feature too -- I believe its lack is one reason the project has its own parquet decoder

https://github.com/apache/datafusion-comet/tree/3413397ce0de890b7d71b25b5a6790cc38cff21f/native/core/src/parquet

Perhaps @andygrove or @viirya @sunchao or @kazuyukitanimura have more details they can share

cc @etseidl who may also be interested

andygrove commented 2 weeks ago

Related issue in Comet: https://github.com/apache/datafusion-comet/issues/1040