eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.55k stars 117 forks source link

Idea: Cryptographic claims #2271

Open madduck opened 11 months ago

madduck commented 11 months ago

It would be amazing if I could attach any number of "claims" to a document, using digital signatures.

Use cases include:

  1. Approving a document
  2. Confirming an invoice as paid
  3. Accepting a set of rules
  4. Confirming the metadata of a document could be done as a claim

Fundamentally, a digital signature is made by encrypting a payload with the private key of a keypair. The payload in this case would consist of a hash of the document data, as well a statement, the "claim". The result is a statement that includes information about who made the claim, and when. Should the subject of a claim (the document) change, then the claim is no longer valid for the changed version (but remains valid for the original).

As documents do not change once they've been uploaded to Docspell, it might be sufficient to use the document ID instead of computing the hash of the data. However, having the document data hash available in the database might actually prove useful, for other cases too.

Considerations have to be made over the extent of a signature/claim:

  1. Is it file-based? By which I mean attachments. Or is it document-based? In this case, the payload would need to include all documents, and a claim would be invalidated if parts of the document were added or removed.
  2. What metadata should be included in the claim? Generally, I tend to assume that metadata should not be part of a claim (after all, the data are "meta"), but this needs to be looked into.

It would be good if there were different types of claim, i.e. an approval claim, and a confirmation claim, etc.. That way, claims can be nicely categorised/presented, and an optional comments field made truly optional.

From a UI perspective, claims can be treated similarly to tags, except they might also have a comment that could be accessible e.g. on hover or click-to-expand. Verifying claims could also easily be done by the UI, either by requesting the signer's public key from the database, or by using an API call verify_claim. Obviously, the UI needs to make the screen bleed if a verification fails.

Making a claim could either be a frontend or a backend operation. I would stay away from requiring the users to maintain key pairs outside of the browser, and even using persistent storage in the browser may cause problems when browsers are switched, multiple machines are in use, etc.. However, if it's a frontend operation, something like OpenGPG.js would be the way to go.

Instead, however, I think it would be worth considering making claims a backened operation, with the key material stored in the database, and encrypted using a passphrase derived from the login passphrase or some OIDC datum.

Storing claims would probably be best done in a separate table. For now, I think only documents need to be subjects, so the table would include document ID and the digital signature (binary data), and probably also user and timestamp, even though these are part of the signature, but this way searches and sorting can be done by the database. Also, a "failed" boolean might be useful, which, if set, invalidates the row and spares us from having to verify again.

The API endpoint make_claim would take a subject and user authorization, and generate the claim by signing on behalf of the user, storing the digital signature in the database, and returning success, or a reference to it.

Finally, claims should also be searchable, i.e. "give me all documents which Martin approved" or "show me the invoices that have been paid".

Obviously, this will not be audit-proof because a malicious admin could intercept the passphrase and decrypt the key, thus making claims in the name of others, but I reckon that's a bridge to cross then. The API could be extended to return the payload and a transaction ID, and asynchronously expect a signature from the user for the given transaction, before storing the claim. This way, the key material wouldn't leave the user's machine.

eikek commented 9 months ago

Thanks for the proposal! In my opinion, the crpytographic part is only interesting if there is some external standard to follow for this exact purpose (and if things can be verified outside of docspell) - otherwise it would mean that users must trust docspell in the first place and then they would need to trust it's crypto code as well. Then I also feel currently, that this goes a bit too far for for a feature for docspell perhaps.

madduck commented 9 months ago

@eikek I totally agree that we should not roll our own or just cook something up. But when you say "external standard", I think we need to differentiate on two levels: technical, and semantic.

For the former, a number of standards already exist, but probably the most prevalent of all these days are JSON Web Tokens. I would recommend to go with those because — while they are the new kid on the block — they build on the same proven tech as everything else, but because they are the new kid on the block, they are likely to have the best feature spread, and also client support. Because the question of who is responsible for the key material is another one to ask. For certain use-cases, server-managed key material will be fine, but if there is no trust in the server, then client-side key management will be required, which is a whole can of worms of its own. Yet, JWT likely already has a bridge to FIDO2/WebAuthN, or similar, or at least gain browser support soon, if such doesn't already exist.

With regards to semantics: I don't think a standard exists, nor should, nor can. In fact, I don't even think that Docspell should impose the semantics. Sure, when used in a workflow such as "metadata approval", the claim is approved-metadata, and that'll be standardised within Docspell, but apart from that, it's really up to the user to use paid or Bezahlt or whatever other claim is being signed. One benefit of JWT is that the payload is JSON, so anything goes, but the data therein are also well-structured, and can thus be interpreted not only by the user, but also outside the organisation. For instance, a JWT saying "paid" (which will be dated and have my signature) should hold just fine as a detached data packet and could be sent to a business partner. Their business logic probably cannot process it as such, but can probably display the claim on screen. That's about as far as I'd go, because semantics differ.