Closed gabriel-vasile closed 3 years ago
Greetings, i'm interested in working on this issue.
Currently looking into available tooling and test files to base our own implementation on.
Base64 p7s from RFC 8551
MIIBJgYJKoZIhvcNAQcCoIIBFzCCARMCAQExADALBgkqhkiG9w0BBwExgf4w
gfsCAQIwJjASMRAwDgYDVQQDEwdDYXJsUlNBAhBGNGvHgABWvBHTbi7EELOw
MAsGCWCGSAFlAwQCAaAxMC8GCSqGSIb3DQEJBDEiBCCxwpZGNZzTSsugsn+f
lEidzQK4mf/ozKqfmbxhcIkKqjALBgkqhkiG9w0BAQsEgYB0XJV7fjPa5Nuh
oth5msDfP8A5urYUMjhNpWgXG8ae3XpppqVrPi2nVO41onHnkByjkeD/wc31
A9WH8MzFQgSTsrJ65JvffTTXkOpRPxsSHn3wJFwP/atWHkh8YK/jR9bULhUl
Mv5jQEDiwVX5DRasxu6Ld8zv9u5/TsdBNiufGw==
OpenSSL can already parse these Files
> $ openssl asn1parse -inform PEM -in p7s.p7s -dump -i [±p7s ●]
0:d=0 hl=4 l= 294 cons: SEQUENCE
4:d=1 hl=2 l= 9 prim: OBJECT :pkcs7-signedData
15:d=1 hl=4 l= 279 cons: cont [ 0 ]
19:d=2 hl=4 l= 275 cons: SEQUENCE
23:d=3 hl=2 l= 1 prim: INTEGER :01
26:d=3 hl=2 l= 0 cons: SET
28:d=3 hl=2 l= 11 cons: SEQUENCE
30:d=4 hl=2 l= 9 prim: OBJECT :pkcs7-data
41:d=3 hl=3 l= 254 cons: SET
44:d=4 hl=3 l= 251 cons: SEQUENCE
47:d=5 hl=2 l= 1 prim: INTEGER :02
50:d=5 hl=2 l= 38 cons: SEQUENCE
52:d=6 hl=2 l= 18 cons: SEQUENCE
54:d=7 hl=2 l= 16 cons: SET
56:d=8 hl=2 l= 14 cons: SEQUENCE
58:d=9 hl=2 l= 3 prim: OBJECT :commonName
63:d=9 hl=2 l= 7 prim: PRINTABLESTRING :CarlRSA
72:d=6 hl=2 l= 16 prim: INTEGER :46346BC7800056BC11D36E2EC410B3B0
90:d=5 hl=2 l= 11 cons: SEQUENCE
92:d=6 hl=2 l= 9 prim: OBJECT :sha256
103:d=5 hl=2 l= 49 cons: cont [ 0 ]
105:d=6 hl=2 l= 47 cons: SEQUENCE
107:d=7 hl=2 l= 9 prim: OBJECT :messageDigest
118:d=7 hl=2 l= 34 cons: SET
120:d=8 hl=2 l= 32 prim: OCTET STRING
0000 - b1 c2 96 46 35 9c d3 4a-cb a0 b2 7f 9f 94 48 9d ...F5..J......H.
0010 - cd 02 b8 99 ff e8 cc aa-9f 99 bc 61 70 89 0a aa ...........ap...
154:d=5 hl=2 l= 11 cons: SEQUENCE
156:d=6 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption
167:d=5 hl=3 l= 128 prim: OCTET STRING
0000 - 74 5c 95 7b 7e 33 da e4-db a1 a2 d8 79 9a c0 df t\.{~3......y...
0010 - 3f c0 39 ba b6 14 32 38-4d a5 68 17 1b c6 9e dd ?.9...28M.h.....
0020 - 7a 69 a6 a5 6b 3e 2d a7-54 ee 35 a2 71 e7 90 1c zi..k>-.T.5.q...
0030 - a3 91 e0 ff c1 cd f5 03-d5 87 f0 cc c5 42 04 93 .............B..
0040 - b2 b2 7a e4 9b df 7d 34-d7 90 ea 51 3f 1b 12 1e ..z...}4...Q?...
0050 - 7d f0 24 5c 0f fd ab 56-1e 48 7c 60 af e3 47 d6 }.$\...V.H|`..G.
0060 - d4 2e 15 25 32 fe 63 40-40 e2 c1 55 f9 0d 16 ac ...%2.c@@..U....
0070 - c6 ee 8b 77 cc ef f6 ee-7f 4e c7 41 36 2b 9f 1b ...w.....N.A6+..
In order to verify that p7s file is a pk7s-signature we can use an ASN.1 parser to parse the file. Then you'll simply have to assert that the object identifier at the start of the ASN.1 sequence is of the type pkcs7-signedData (see chapter 14 of RFC 2315 for a definition of this object identifier). This detection method does not proof that the p7s file actually is a PKCS-7 signature, but changes are extremely high.
We must consider the time and memory expenses of parsing. If you are concerned parsing is needed in order to prove a file is .p7s, well... most of the existing functions in the library don't prove, but rather say "this file looks like a pdf, png, etc".
Is there a reason why you think using magic numbers from tika would not be a good fit for this case?
Thanks for your input. You are probably right with the computing expenses of parsing.
Most modern files i've seen use Base64 encoding. tika doesn't mention base64 encodings. Theres no "-----BEGIN PKCS7" line or something in the examples from the RFC because the application-type is always defined in the Headers of the MIME File.
We could probably look into finding only the Object Identifier Sequenze for signedData
aka pkcs7-signedData in the decoded Data. It looks something like 1 2 840 113549 1 7 2
0 294: SEQUENCE {
4 9: OBJECT IDENTIFIER signedData (1 2 840 113549 1 7 2)
This should indicate that our file is a p7s file-signature.
I will try and make the magic numbers from tika work.
So we got PEM,DER and Base64 encoding.
PEM is fairly straight forward. We just check if strings.Contains(instring, "-----BEGIN PKCS7")
see http://justsolve.archiveteam.org/wiki/PKCS7_certificate
DER - working on it
Base64 - same as PEM without Headline , can't be detected without decoding the Base64
The PR implements PEM,DER P7S Signatures with Magic Bytes as suggested
1) Specify the MIME type and extension for which to add support application/pkcs7-signature, .p7s 2) Share an example file ... 3) Optionally, add a reference to the specification of the file format. https://tools.ietf.org/html/rfc8551 https://github.com/digipres/digipres.github.io/blob/master/_sources/registries/tika/tika-mimetypes.xml#L524