MatthiasValvekens / pyHanko

pyHanko: sign and stamp PDF files
MIT License
483 stars 71 forks source link

Allow different digest_algorithm and signature_algorithm when validating CMS #242

Closed peteris-zealid closed 1 year ago

peteris-zealid commented 1 year ago

Is your feature request related to a problem? Please describe. Yes, id documents with nfc chips have something called the security object (sod). This sod contains some hashes of personal information and a CMS structure signing those hashes. The signature is produced by a document issuing authority of a given country so there is not much that can be done to influence the signing process. For Lietuvas documents these digests are in fact different. (I will check other countries in the following days)

Describe the solution you'd like If I understand rfc6211 correctly the requirement for the two algorithms to be identical is not too strict.

   digestAlgorithm  has been implicitly protected by the fact that CMS
      has only defined one digest algorithm for each hash value length.
      (The algorithm RIPEMD-160 was never standardized.)  There is also
      an unwritten convention that the same digest algorithm should be
      used both here and for the signature algorithm.  If newer digest
      algorithms are defined so that there are multiple algorithms for a
      given hash length (it is expected that the SHA-3 project will do
      so), or that parameters are defined for a specific algorithm, much
      of the implicit protection will be lost.

I would suggest adding a flag (on validation_context) that would allow skipping this particular check. These are the lines in question:

        if (
            sig_hash_algo is not None
            and sig_hash_algo.dump() != hash_algo_obj.dump()
        ):
            raise SignatureValidationError(

Describe alternatives you've considered Not much right now.

Additional context I am describing this as a feature not a bug because the assumption that both digest algorithms will match is very reasonable.

Here is base64 of an LT document sod

peteris-zealid commented 1 year ago

Also from rfc6211

There is a convention, undocumented as far as I
can tell, that the same hash algorithm should be used for both the
content digest and the signature digest.  There are cases, such as
third-party signers that are only given a content digest, where such
a convention cannot be enforced.
MatthiasValvekens commented 1 year ago

Hi @peteris-zealid,

pyHanko interprets CMS (RFC 5652) as updated in RFC 8933, which makes that convention mandatory. But I can try to add in an override. I'm not yet completely sure where, though, since ValidationContext is about x509 validation more than about signature checks, but that's a relatively minor detail.

peteris-zealid commented 1 year ago

Yes, you are correct. I now looked at RFC8933.

An override would be much appreciated, because I do not see how I could hack around this if clause in a way that is reasonably elegant.

With regards to implementation putting the override in AlgorithmUsagePolicy would probably make more sense, but it is still part of validation context. Not sure about the semantics of all these things.

In general the internal api of certvalidator is better after the refactor. I did not inspect too much, but at first glance it looks really good. Well done!

peteris-zealid commented 1 year ago

I can also do the implementation if you are comfortable with that.

peteris-zealid commented 1 year ago

how_i_would_do_it.patch

The changes are minimal and I am able to easily bypass the check I want by subclassing, for example, DisallowWeakAlgorithmsPolicy.

I understand that using getattr is not a very elegant solution, but it is probably better than splitting the changes between two repos.

Another option I am considering is to maintain a small fork just for ZealiD related purposes. This is really not a problem (we have done it before) and if you would prefer this just give the nod.

MatthiasValvekens commented 1 year ago

Hi @peteris-zealid,

Thanks for the patch suggestion! I considered doing it that way, but that'd essentially forever doom this feature to the status of "undocumented hack". I'd rather (abstractly) subclass AlgorithmPolicy in pyHanko with an extra method and change pyHanko's API to take that subclass instead.

It's technically a breaking change, but with some sensible defaults, the "breakage" is only at the nominal level (i.e. typechecks). At runtime it shouldn't have much of an effect.

MatthiasValvekens commented 1 year ago

I have a PR open (#245); feel free to take a look if you want :).