element-hq / matrix-content-scanner-python

A web service for scanning media hosted by a Matrix media repository
Apache License 2.0
12 stars 8 forks source link

Use of deprecated libolm for public-key encryption of POST bodies #64

Open reivilibre opened 1 month ago

reivilibre commented 1 month ago

libolm has been deprecated and the content scanner uses it through python-olm.

We're not aware of any active security problem with using it, but it seems wise to think about this.

The content scanner only uses it for decrypting the bodies of POST requests, using a public-key encryption scheme, in order to prevent snooping by the server performing TLS termination or any node between that one and the content scanner itself.

The code in question is here as below: https://github.com/element-hq/matrix-content-scanner-python/blob/da1783c6bce638c3f616fb061425520fc19863a6/src/matrix_content_scanner/crypto.py#L89-L117

Vodozemac does not implement the equivalent Olm PkEncryption functionality. (It also sounds like it is unlikely it ever will be implemented as-is, because something is off about the MAC calculation in Olm's current specification — N.B. this MAC issue is no problem for the content scanner) As far as I know, Element X does not implement support for the content scanner currently. Presumably the lack of implementations of Olm PkEncryption will pose an equivalent problem for them when it's time to implement support.

My suspicion is that this functionality only uses Olm because the clients already had libolm available as a prerequisite for being a E2EE-supporting Matrix client and so it was somewhat convenient to just use what was already available.

Potential solutions:

  1. reimplement this exact operation using the pyca/cryptography library
  2. change the protocol a bit to use some other (more common?) public key cryptography scheme that already has easy-to-use libraries available, e.g. maybe something based on libsodium/NaCl public-key crypto SealedBox. Bear in mind that clients need to use the same scheme here.
    • we could support both in parallel if needed for backwards compatibility
    • this option might make sense because Element X / Vodozemac-using clients will need to reimplement Olm's PkEncryption anyway, so there would be no benefit in sticking with the old over something more standard if it existed
reivilibre commented 3 weeks ago

Turns out the rust SDK has an implementation of the same functionality: https://github.com/matrix-org/matrix-rust-sdk/blob/ca09917d8452769a9b48db394c950ff3ee906ae7/crates/matrix-sdk-crypto/src/backups/keys/compat.rs

MatMaul commented 2 weeks ago

If we expose this in the FFI layer of matrix-sdk-crypto it should be fairly trivial to use generated Python bindings for the scanner itself. I tried to use generated Python bindings for the full SDK for a bot prototype and it seems to work fairly well, including async/await integration (not needed for this use case however).

MatMaul commented 2 weeks ago

I think that's the easiest way forward since it also open the gate to using the FFI layer with new AND old app, and also remove libolm altogether from there.

I have a WIP for adding it to the FFI layer, I am now trying to generate a wheel with maturin, but failing for now.

MatMaul commented 1 week ago

Some WIPs:

poljar commented 1 week ago

If we expose this in the FFI layer of matrix-sdk-crypto it should be fairly trivial to use generated Python bindings for the scanner itself.

But the matrix-sdk-crypto crate does not expose the PkEncryption stuff either.

Perhaps https://github.com/matrix-org/vodozemac/pull/171 and https://github.com/matrix-nio/vodozemac-python would be the better path forward.

MatMaul commented 1 week ago

Oh great I didn't know there were some work to bring PkEncryption to vodozemac, it's fine for me too. I went this way because an impl already exists in matrix-sdk-crypto.

MatMaul commented 1 week ago

@poljar are the Python bindings supposed to be working ?

After doing a maturin build, I get a wheel with the .so inside, but no generated Python code that would call into it. I am not sure if I am doing something wrong or if it is not in a working state yet.

poljar commented 1 week ago

Well there are Python tests that are run as part of the CI, so they should™ work.

MatMaul commented 1 week ago

So it turns out that a .so can be a self describing Python module, and doesn't need a matching .py file on the side or anything. Learning new stuff every day :)

Thanks, it indeed works.

MatMaul commented 1 week ago

I just pushed https://github.com/matrix-nio/vodozemac-python/pull/4.