ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
https://arcadedb.com
Apache License 2.0
478 stars 59 forks source link

Support for encryption in serializer #1659

Open lvca opened 1 month ago

lvca commented 1 month ago

@dijef I can see how encryption (and compression) would be implemented much better into the serializer. I was thinking of providing a simple listener interface that acts right at the beginning for deserialization, converting the encrypted value into a normal buffer and at the end for serialization: when the buffer is created, before returning is encrypted.

Providing a listener could be the quick solution. The best would be providing the listener and a pluggable implementation (an implementation of the listener interface) that does the job of accepting the algorithm to use, keys, etc. So everybody can just configure and use it.

The next step could be allowing to encrypt/decrypt only specific buckets (by configuration) or even only specific properties (probably overkill).

@dijef if you can find time to draft an implementation it would be awesome.

Originally posted by @lvca in https://github.com/ArcadeData/arcadedb/discussions/535#discussioncomment-9829455

lvca commented 1 month ago

@pawellhasa volunteer for this implementation.

finduspedersen commented 1 month ago

How do you plan to manage encryption keys? I would love to see a solution where the encryption key is only stored in the RAM of the ArcadeDB server during the short time when a command or query is executed. After this, the encryption key shall explicitly be wiped by overwriting the memory with NULL characters. This can be achieved via a new EncryptionKey header which should then be included in every REST request to the database, similar to what happens when an Authorization header is also included in every REST request. Following this approach will guarantee that the encryption key is only available in the ArcadeDB process for a very short time, which is a sound zero-trust security principle.

dijef commented 1 month ago

So the implementation I have in place require providing encryption key and algorithm settings at server start-up which is stored in memory. Reading it require physical access to server and heap of the process. Primary goal I am trying to address is to protect database content for on-site installation / data copy leak (encryption at rest). Our DB connection is not exposed and only client is our back-end. Data that comes out from serialiser is de-crypted at the time of processing. The way I understand your requirement, is that every client provides encryption key with request and so each client's data can have different encryption. I can see this possible but I also see challenging, e.g. shared records are accessed (we have them). Sounds also like users management. We have that in place outside of ArcadeDB lib with read/write access and roles.

I'll share draft this week.

btw. @pawellhasa is my company account (to avoid confusion with different accounts replies)

pawellhasa commented 1 month ago

@lvca please assign to me, I'll push code today

pawellhasa commented 1 month ago

This is draft with changes, please drop me all feedback. I will be away until about 22nd of August so until then :)

lvca commented 3 weeks ago

Added a review of your PR about a week ago. No rush, when you'll be back take a look at my comments. Thanks.