Open tibaes opened 1 year ago
We do not currently support this, but would welcome contributions to add support for it.
@tustvold I'm interested in doing implementation work for this. I'd love to have a dedicated chat about it with a maintainer or community member who has context for this issue and could get me involved with contributor discussion spaces!
I'm afraid I don't really have any context on this, as it isn't a part of the standard I am familiar with. Implementing this will likely involve interpreting the spec at https://github.com/apache/parquet-format and applying it to the Rust reader. If this is anything like other aspects of parquet, this will also involve a fair amount of spelunking in existing implementations to clarify ambiguity.
The actual encryption part can probably use something like https://docs.rs/ring/latest/ring/ as an optional dependency, but I'm just guessing here that the encryption is something standard.
I'm sorry I can't be of more help, I'd love to see this implemented and am happy to help review code contributions, but I don't really have the bandwidth at the moment to actively help with the actual implementation effort.
Thanks for the quick response! Parquet encryption uses two extremely standard primitives (which ring has perfectly fine implementations of). In principle, the encryption step is a very simple post-processing step, but I definitely anticipate the existing implementations having some weird quirks.
Given your resources, I'll just try to roll an implementation and submit it for review.
Thank you, I'm happy to review code, especially if it is well tested
Hi @bhoberman , I've started looking into building a Rust implementation of PME, but fortunately have found this thread quickly. How this work is going? I'd be glad to provide any help (review, advice on PME design, code contributions, etc) as needed, feel free to ping me on [sent before edit]
Hey @ggershinsky thanks for reaching out! I'll contact you privately with more details.
TL;DR for those using this thread as a status indicator: this was going to be a work project for me, and we decided after the research phase that it made the most sense to bind to Arrow C++ for our use-case and staffing. That said, I'd love to contribute some personal time to this project should @ggershinsky or someone else be willing to drive it.
Hey @tustvold, @ggershinsky and I have met and are starting on an implementation of this together. Would it be possible for us to get invites to the Apache slack (as mentioned in the README) for easier coordination than email/GitHub?
Sure, if you join the discord you can then DM me your email addresses
hey @tustvold and @bhoberman did you end up connecting or make any progress on the rust implementation? I checked the discord but didn't see any messages around encryption there. We're working on something that would depend on this and would love to help contribute if there's something already partially implemented.
We (@G-Research) would also like to see Parquet encryption support added and can contribute to this effort too, maybe we can work together on this @matthewgapp?
Hi @matthewgapp @adamreeve . Ben (@bhoberman) and I have worked on this for a while, but had to switch to other projects. Feel free to use the early draft (branch and an internal patch) any way you like. As always, I'll be glad to help with PME design questions, etc, you can reach me on asf slack and on github.
@matthewgapp, are you on the Arrow Rust discord? I'm adamrnz there if you want to discuss this further. It looks like I could also invite you to the ASF Slack workspace if that's easier
hey @adamreeve and @ggershinsky apologies for the delayed response here. Adam, I'll message you on discord. Happy to do slack if that's better
thanks @ggershinsky for the draft and patch! will let you know if questions
Hi all. I'm @adamreeve's colleague and I happen to have time available to do some work on this. Could I help with any potential open tasks @matthewgapp or is it better I pick up the @ggershinsky's draft branch?
Hi all. I'm @adamreeve's colleague and I happen to have time available to do some work on this. Could I help with any potential open tasks @matthewgapp or is it better I pick up the @ggershinsky's draft branch?
❤️
I believe the Apache DataFusion Comet project may be interested in this feature too -- I believe its lack is one reason the project has its own parquet decoder
Perhaps @andygrove or @viirya @sunchao or @kazuyukitanimura have more details they can share
cc @etseidl who may also be interested
Related issue in Comet: https://github.com/apache/datafusion-comet/issues/1040
Which part is this question about Documentation
Describe your question Is Parquet Modular Encryption supported by this library?
Additional context I have found some mentions to AES and encryption here and there on the documentations and code base, however there is no example of it. I am strugling to make it work, so I'm starting to think this is not fully supported yet.