FlorianRhiem commented 7 months ago

Currently, there is no mechanism to ensure that an .eln file was actually created by a specific ELN, which poses issues related to data provenance and in how far the information inside the .eln can be trusted.

Motivation

While ELNs generally assume that the information from users can be trusted, there are cases where it may be necessary to know that the information was entered at a specific time by a specific person, and where proof is needed for this instead of relying on the users to be trustworthy. To achieve this, many ELNs include mechanisms such as timestamping, versioning or signing of entered information.

For those ELNs, importing an .eln file must not circumvent these mechanisms, as that would render them useless, and as such all information has to be marked as coming from an import from user X at date/time Y. A chain of trust for information from .eln files could potentially improve the situation there, as another ELN might be more trust-worthy than users in these cases. It would still be necessary to mark the information as coming from an import, but it could be marked as coming from a specific ELN instead of coming from a user.

Ideas / Suggestions

The .eln file consists of an ro-crate-metadata.json and various other files, which should be listed in the ro-crate-metadata.json file. If possible, the ro-crate-metadata.json should include SHA256 hash values for those files, which allows us to trust that those files have not been tampered with (or suffered from data rot) as long as the ro-crate-metadata.json itself is trustworthy. As such, implementing a system of trust for the ro-crate-metadata.json should be satisfactory to ensure that the whole .eln file can be trusted.

A typical approach for this would to provide a signature for the ro-crate-metadata.json alongside the file itself. To do this, we need to figure out what digital signature scheme we would want to use, how keys are to be discovered and how the signature should be stored inside the .eln file.

Digital signature scheme

There are various schemes for how to generate keys, how to sign a series of bytes and how to verify the signature, and I'm not knowledgable enough in this area to suggest a specific scheme. I would suggest using an already widely used and supported scheme though.

Key distribution/discovery

As digital signature schemes use a key create the signature (or rather, a key pair), we will need a method of distributing or discovering the key (or rather, the public key) used for an .eln file.

One approach would be to have a chain of trust for those keys, similar (or ideally identical to) the one used for X.509 certificates used in TLS / HTTPS. This would have the advantage that it's a well-known scheme, already implemented widely and most web-based ELNs would already have a certificate (and associated key). The certificate chain could be provided alongside the signature, so that the signature can be checked as long as the root certificate authority is known (and trusted) and the .eln file certificates have not expired or revoked.

Another approach that would work for web-based ELNs would be to either query the origin ELN for its public key via HTTPS, or to submit the signature and ro-crate-metadata.json to the origin ELN for validation. The latter would have the advantage that we do not need to agree on a digital signature scheme at all, and that there's no need for key pairs, etc. as the signature could be generated and validated in various ways depending on the preferences of the ELN developer. The big disadvantage for both of these, of course, is that the origin ELN would need to be accessible, which is not always the case due to network issues or simply ELNs running behind a firewall.

Signature storage

The signature could either be stored in an additional file inside the .eln archive at a well-known location or it could be included in the extra field of the ZIP file. I personally think the simplicity of storing it in afile might be preferable, but for programs both methods should be equally easy to implement.

Personally, I think piggybacking off the infrastructure and expertise behind TLS / HTTPS would be easiest. Doing it this way, importing an .eln file should have the same security as directly importing the information from the origin ELN via HTTPS, and the origin ELNs domain could be shown as the source of the information without an additional caveat.

What are your thoughts on this?

NicolasCARPi commented 7 months ago

Very good summary, thank you!

Just writing some thoughts... Let's assume we export from instance A and import in instance B.

the metadata.json file contains the path to the public key, e.g. https://elnA.example.org/.well-known/pub.sig. This also allows to define an arbitrary keyserver. We rely on HTTPS to verify that the server is who it claims it is and assume the key has been installed there by sysadmins of Instance A.
instance B fetches that public key and verifies the signature.
if the signature is valid, then the content is imported
content is marked as being "Trusted from instance A at URL http.... signed with key ID ..." in instance B

Notes:

Instance A could have an allowlist of URLs to fetch keys from, or a list of known trusted public keys pre-defined
We could imagine the metadata.json file also contains the correct algorithm to use, because why not (there would be a default one).
I can think of many aspects that need to be thought of, such as what happens when a trusted entry has been created but is then modified? Should it be marked read-only? Should the trusted version be stored alongside? I guess this also depends on the ELN's revision/changelog system.
we should also think in terms of user defined signature, not instance level signature.

nicobrandt commented 7 months ago

As was already mentioned in our last meeting, I think it might be worth it to raise this issue with the RO-Crate people directly as well. I couldn't find anything similar when I searched on https://github.com/ResearchObject/ro-crate, but it could still be interesting for either them or other users. It also seems like a larger undertaking to me, so it would be beneficial to get some more output from the outside anyways.

NicolasCARPi commented 7 months ago

I agree with @nicobrandt and I'm surprised this topic seem to never have come up on RO-Crate! @FlorianRhiem can we let you open an issue on RO-Crate and link it here?

FlorianRhiem commented 4 months ago

I have tried to use CMS directly with a server certificate, however OpenSSL reports that the purpose of the server certificate is invalid for signing messages and refuses to verify any generated signatures. An alternative that isn't bound as tightly to S/MIME and seems to work perfectly fine with TLS certificates are the openssl dgst functions

This archive contains the ro-crate-metadata.json from the current SampleDB example, a signature.sha256 that contains the signature, and the TLS certificate including the complete chain as certificate-with-chain.pem. The signature was created using the private key for that certificate by running the following command:

openssl dgst -sha256 -sign private.pem -out signature.sha256 ro-crate-metadata.json

To verify a signature, you can first verify the certificate:

openssl verify -untrusted certificate-with-chain.pem certificate-with-chain.pem

Next, you can query the certificate subject to find out what the source of the .eln file is:

openssl x509 -noout -subject -in certificate-with-chain.pem

Then you need to extract the public key for the certificate, to use it for verifying the actual signature:

openssl x509 -pubkey -out public.pem -in certificate-with-chain.pem

And lastly, you need to verify the signature:

openssl dgst -sha256 -verify public.pem -signature signature.sha256 ro-crate-metadata.json

This signature file and the certificate chain could be placed in a well known directory within the zip file, so that implementations know where to look for them.

A potential difficulty is how to deal with expired certificates. They should not be trusted anymore, however that would mean that an .eln file created with an old certificate would not be trusted either. This should not be an issue for a typical use case of exchanging information using .eln files, would however pose an issue for importing old .eln files, e.g. when revisiting old data.

NicolasCARPi commented 4 months ago

For expired certs, I'm guessing a graceful degradation with a warning that can be bypassed could work. Same thing browsers do when you visit a site with expired cert. Unless valid signature is required.

NicolasCARPi commented 4 months ago

What does everyone think about using minisign? I tried first with signify but it seems minisign is better because it allows trusted comments, and to create them during signature.

It's the right tool for the job I believe. It does one thing, and does it well.

Here is how to test, after installing minisign with your favorite package manager:

# generate an asymmetric key pair (-W is no passphrase)
minisign -GW
# now sign the ro-crate-metadata.json file in the current dir
# we add a trusted comment with the URL to the instance (or pub key?)
minisign -S -t 'created by https://eln.example.org' -m ro-crate-metadata.json
# verify
minisign -V -m ro-crate-metadata.json
# or more realistically on an instance
minisign -V -p /path/to/trusted/keystore/eln.pub -m ro-crate-metadata.json

So in order to verify that the ro-crate-metadata.json file is authentic, one need to know which pub key to use (the signature file gives a hint), have it, and trust it.

So there is in any case a step of adding a list of trusted pub keys (like all signature schemes, at some point you need to trust something), and their corresponding instances. This could easily be done with a public repository, not an issue. Then target instances could chose to trust the whole set of keys in this repo, or just pick a few. On our side, we would merge PR coming from verified instance owner, basically asking for the key to be at https://eln.example.org/.well-known/minisign.pub. And if the instance is not reachable, find another way to assert trust. But let's not focus on this for now.

Once we have verified the signature, we know:

the archive hasn't been modified (especially true if sha256sum is used for all files!)
we can trust the comment that tells us the origin of this file

We can then make an informed choice about what trust level to apply when importing. And can even add some info such as: Trusted import from https://eln.example.org

To summarize:

using a dedicated tool with ed25519 and sha512 is way simpler than using openssl x509
we could easily support signify, gpg or other similar signature schemes/apps, we don't really need to limit ourselves. But let's focus on one first.

What do you think?

NicolasCARPi commented 4 months ago

Hello everyone,

So I spent all night re-implementing minisign in PHP. And I'm at a point where eLab can generate a keypair and sign a pre-hashed message with it. I still have a lot of work to do, but the hardest part is done and the signature produced can be verified with minisign, which is great IMHO.

I started this work after my message above, that made me realize that this would be a very good approach for Experiments signature. It probably means that exported .eln might contain signature file, too. I'm not there yet though.

But what this made me realize is that the whole thing is very sound and does exactly what we want/need.

There is a python implementation: https://github.com/x13a/py-minisign but you'll probably need to figure out the quircks like I did (such as the SK being 64 bits now instead of 32 bits in the PHP implementation not updated since 4 years). But the whole thing is pretty straightforward to implement, once you know what tools to use and how to use them. It does require a bit of familiarity with crypto primitives, though. Or you can just shell out to the binary and be done with it ;)

Make sure to use Ed25519ph (https://datatracker.ietf.org/doc/html/rfc8032): the ph stands for pre-hashed, and that's what we want to use, allowing us to sign be files without loading them in memory (because we sign the hash). There are other advantages, see https://github.com/jedisct1/minisign/issues/104.

Anyway, just wanted to let you know that now that I'm using this for user signing experiments, it would make a lot of sense to also use it to sign an exported .eln.

Here is what a signature could look like:

untrusted comment: elabftw/50100: signature from key ab1a06fcc8c84722
RUSrGgb8yMhHInmFxm1QHsZj40mkW4YdXiawxnmMsTwlE8uXqv4MR/EOLDhMPjAnZtn6YG8x7PlxAw8NCZLplXwnV3gzTXR/kAs=
trusted comment: signed by Toto Le sysadmin (toto@yopmail.com) at 2024-03-16T05:27:52+01:00 by eLabFTW/50100 hosted at https://elab.local:3148
k+oO3XwzaCs02K1j2Oyw0H4lpxeYOpEQ/TcxsUiiGN54yaML9RpT5mnHK3sVUOhyt+LMoAU4Z6agvG6dvzaAAw==

First line is a hint as to which key has been used (8 random bytes are used as KeyId), but it's untrusted. Second line is the signature of the message 3rd line is a trusted comment with metadata about the signature, and we can find the instance URL in there (we could discuss standardization of this comment!). 4th line is the signature of the signature + trusted comment.

The public key looks like:

untrusted comment: elabftw/50000: public key ab1a06fcc8c84722
RWSrGgb8yMhHIqIMOLDIbJH864ndKh+Q4Xa/m5dQ9PxJn1SLYw0oNTi/

And the encrypted private key looks like:

untrusted comment: elabftw/50000: encrypted secret key ab1a06fcc8c84722
RWRTY0IywHn83VU9pGYmfo8+BIw8purvaQFWK/zkzjda4wygLgsAAAgAAAAAAAAAAAEAAAAAGEFSnPo+IJ7/lupMoSDrWeCK0wz7Ml1R055ld8+fQCBcn9Z1oa1jrfyuGypaEW9+P41b3mtSt3sFV/...truncated

The private key is encrypted and we unlock it with a passphrase.

The elabftw/50000 is the "user agent", with 50000 corresponding to 5.0.0 (and 50100 corresponding to 5.1.0). It's an integer representation of semver (easier to compare).

I'm thinking I'll save the signature in a zip file with:

the json data that is signed
the signature file
the public key
a sh script to verify the signature

The only thing left to the verifier is to trust the public key, that it matches with a particular human.

As conclusion, I'd say using minisign is very good and I strongly vote for using this for the .eln. As you can see the files are small, and it's much better than the whole x509 shenanigans.

I'll also look at some point about FIDO2 implementation, see https://github.com/jedisct1/minisign/issues/100#issuecomment-1614366867. Because being able to sign a notebook entry with hardware key and produce a verifiable signature bit is the end goal.

I'll go back to hacking now :D

SteffenBrinckmann commented 3 months ago

Hey, does anybody know how APK, AAP are signed? Do I understand it correctly that the author, not Google, signed them and they are just zip?

SteffenBrinckmann commented 3 months ago

My bad, Google has to know the key to verify it.

NicolasCARPi commented 3 months ago

It's described here: https://source.android.com/docs/security/features/apksigning/v3

It uses X509.

You can find a description of the process in eLab here: https://github.com/elabftw/elabdoc/blob/next/doc/user-guide.rst#advanced-cryptographic-signatures

I really like the fact that the signature can easily be verified by an external tool such as minisign.

FlorianRhiem commented 3 months ago

After looking into minisign and playing around with an implementation, I'm a bit torn on my opinion. It is fairly easy to implement and embed in applications (as long as an implementation of Ed25519 is available, which does all the actual signing and verification work), but it doesn't help with the part that makes X.509 so useful: reliably knowing what public keys belong to whom. X.509 certificates allow verification to happen in complete isolation as CA certificates are usually already present on systems.

From the two methods you propose for solving this, using a well-known URL, i.e. using X.509 certificates as part of HTTPS, feels much 'cleaner' than relying on a central repository of trusted public keys and their corresponding ELNs, however it requires that the exporting ELN has to be reachable to check its public key, which isn't great either.

Something we could do is store the Ed25519 public key in an X.509 certificate. That way we could use minisign or anything else based on Ed25519 for signing and verification, and the certificate chain based on root CAs for knowing we can trust the public key, and avoid relying on the OpenSSL dgst functions.

NicolasCARPi commented 3 months ago

X.509 certificates allow verification to happen in complete isolation as CA certificates are usually already present on systems.

Not necessarily. For instance, I know an instance that has custom cert, signed by local authority, and all the browsers have the CA in their trust store added via GPO. So in that case, the CA system fails in an external context.

Whereas as long as you can tie a pubkey to an instance, you can then verify that whatever is signed is correct, and coming from that instance.

Verification can be done by GET .well-known/minisign.pub. And we endup at what I suggested earlier: a curated list of instances and their public keys for the ones behind a firewall.

The trusted comment contains json such as:

{
  "firstname": "Toto",
  "lastname": "Le sysadmin",
  "email": "toto@yopmail.com",
  "created_at": "2024-03-18T00:48:39+01:00",
  "site_url": "https://elab.local:3148",
  "created_by": "eLabFTW 50100",
  "meaning": "Approval"
}

So we grab the site_url, try a GET to the .well-known/pubkey, if it fails, get the pub key from the curated public list, and do the verification with that pubkey. If it's ok, then you are certain that this data has been signed by that instance, and the whole x509 verification happens at the TLS/cert level! (or indirectly via our curated list).

We want to keep things simple, and adding x509 inside minisign would be the worst option IMHO. So fetching the pubkey from the instance directly seems to be a very good option:

we verify that the instance is who it claims it is thanks to TLS
we verify that this instance owns this public key

SteffenBrinckmann commented 2 months ago

@NicolasCARPi Can you link to an example .eln-style file that follows your suggestion? Using that example, one can understand these items better

NicolasCARPi commented 2 months ago

@SteffenBrinckmann , see this file:

example-signature.zip

Extract it, go into the folder 2024-04-17-174310-export and run:

minisign -H -V -p ro-crate.pubkey -m ro-crate-metadata.json

Now you've verified that the signature is correct, and has been created with the secret key that corresponds to the public key present in the archive.

In order to increase our trust about the fact that this public key is indeed coming from the instance that this archive is saying it's coming from, we fetch it at https://eln.university.org/.well-known/signature-key.pub.

Simply comparing that both public keys are the same is enough. Or you can verify the crate with that key instead. Anyway, now we've validated that this archive comes from that instance, and has NOT been tempered with. Which is exactly what we wanted to do in the first place :tada:

So we can give to the content of that archive the same level of trust we could give to that instance operators.

If the (source) instance cannot be reached, we should have a way to ask the sysadmin about this key. This can be documented in our respective elns. Let's first think about the happy path, and then we can think about edge cases.

As you can see, the process is pretty straightforward:

Verify that the public key is good by fetching it from the instance itself
Verify that the signature (and hence the data) is good
Profit!

SteffenBrinckmann commented 2 months ago

Great, I got it to work and I understand the file-structure. Could you include the location of the public key inside minisign's trusted comment. Then the process is easy:

check file with given pub-key
get pubkey-copy from then 'trusted' comment
compare copies
show user that the key came from this url

Do we trust / not-trust public key-stores?

NicolasCARPi commented 2 months ago

Could you include the location of the public key inside minisign's trusted comment

I don't think it's a good idea to use something like pubkey_url, because this is controlled by the attacker, so you could simply fetch it from: attacker.com/pubkey instead of the instance url. And verification would work fine.

This is why we only mention the site_url and we build the pubkey url by appending .well-known/minisig.pub or something similar. The .well-known folder is guaranteed to be managed by the system operator, unlike an arbitrary url. When you go to https://www.deltablot.com/.well-known/security.txt you know it's not user content but sysadmin content.

As a side historical note, this is why web servers are historically on port 80, requiring root access to bind to that port, so you can trust that the webserver is run by root operator, not some user on the system!

edit:

Do we trust / not-trust public key-stores?

What do you mean by that? The only thing we can trust is the .well-known folder of the instance URL.

SteffenBrinckmann commented 2 months ago

We could also store pubkeys on http://keyserver.pgp.com ? Not sure how helpful this is: complicated to setup, deletes content after 6 months

NicolasCARPi commented 2 months ago

No, keyservers are a failure, also they are only for GPG/PGP keys AFAIK. See: https://gist.github.com/rjhansen/67ab921ffb4084c865b3618d6955275f

SteffenBrinckmann commented 2 months ago

How could a desktop eln-software create a server, that it does not rely on? Pasta could only store the pub-keys in a dedicated server location

NicolasCARPi commented 2 months ago

I see.

If it's not on a server, I'd say your "instance level" key is the same as the "user level" key, no? In the desktop world, a user is the same as an instance. So we end up with the need to verify that this .eln archive was generated by this particular user. AFAIK there are no real standards about this of course, because not everyone has a personal website where they can publish their keys.

We could think of:

try and find it in $SITE_URL/.well-known/pub.key
if site_url doesn't exist (desktop app) or server is unreachable, use the pubkey_url, that could be anything. So the trust is shifted to whoever owns that server.

Or simply display the key, ask the importer what level of trust could be attached to that key. Similar to GPG, there is a trust level for keys. But here 0 or 1 is enough.

GPG simply displays a warning:

Checking integrity of /var/opt/csw/pkgutil/catalog.mirror.opencsw.org_opencsw_testing_i386_5.11 with gpg. gpg: Signature made Sat Apr 20 06:10:03 2019 EDT using DSA key ID 9306CC77 gpg: Good signature from "OpenCSW catalog signing board@opencsw.org" gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 4DCE 3C80 AAB2 CAB1 E60C 9A3C 05F4 2D66 9306 CC77

So if the pubkey of the source user is not present in the target instance/app, it needs to be imported with low trust level.

SteffenBrinckmann commented 2 months ago

That would work, for me

TheELNConsortium / TheELNFileFormat

Signatures for .eln files #56

Motivation

Ideas / Suggestions

Digital signature scheme

Key distribution/discovery

Signature storage