gabriel-vasile / mimetype

A fast Golang library for media type and file extension detection, based on magic numbers
https://pkg.go.dev/github.com/gabriel-vasile/mimetype#pkg-overview
MIT License
1.5k stars 159 forks source link

Add support for p7s file format #123

Closed gabriel-vasile closed 3 years ago

gabriel-vasile commented 3 years ago

1) Specify the MIME type and extension for which to add support application/pkcs7-signature, .p7s 2) Share an example file ... 3) Optionally, add a reference to the specification of the file format. https://tools.ietf.org/html/rfc8551 https://github.com/digipres/digipres.github.io/blob/master/_sources/registries/tika/tika-mimetypes.xml#L524

localleon commented 3 years ago

Greetings, i'm interested in working on this issue.

Currently looking into available tooling and test files to base our own implementation on.

Base64 p7s from RFC 8551

MIIBJgYJKoZIhvcNAQcCoIIBFzCCARMCAQExADALBgkqhkiG9w0BBwExgf4w
gfsCAQIwJjASMRAwDgYDVQQDEwdDYXJsUlNBAhBGNGvHgABWvBHTbi7EELOw
MAsGCWCGSAFlAwQCAaAxMC8GCSqGSIb3DQEJBDEiBCCxwpZGNZzTSsugsn+f
lEidzQK4mf/ozKqfmbxhcIkKqjALBgkqhkiG9w0BAQsEgYB0XJV7fjPa5Nuh
oth5msDfP8A5urYUMjhNpWgXG8ae3XpppqVrPi2nVO41onHnkByjkeD/wc31
A9WH8MzFQgSTsrJ65JvffTTXkOpRPxsSHn3wJFwP/atWHkh8YK/jR9bULhUl
Mv5jQEDiwVX5DRasxu6Ld8zv9u5/TsdBNiufGw==

OpenSSL can already parse these Files

> $ openssl asn1parse -inform PEM -in p7s.p7s -dump -i                                          [±p7s ●]
    0:d=0  hl=4 l= 294 cons: SEQUENCE          
    4:d=1  hl=2 l=   9 prim:  OBJECT            :pkcs7-signedData
   15:d=1  hl=4 l= 279 cons:  cont [ 0 ]        
   19:d=2  hl=4 l= 275 cons:   SEQUENCE          
   23:d=3  hl=2 l=   1 prim:    INTEGER           :01
   26:d=3  hl=2 l=   0 cons:    SET               
   28:d=3  hl=2 l=  11 cons:    SEQUENCE          
   30:d=4  hl=2 l=   9 prim:     OBJECT            :pkcs7-data
   41:d=3  hl=3 l= 254 cons:    SET               
   44:d=4  hl=3 l= 251 cons:     SEQUENCE          
   47:d=5  hl=2 l=   1 prim:      INTEGER           :02
   50:d=5  hl=2 l=  38 cons:      SEQUENCE          
   52:d=6  hl=2 l=  18 cons:       SEQUENCE          
   54:d=7  hl=2 l=  16 cons:        SET               
   56:d=8  hl=2 l=  14 cons:         SEQUENCE          
   58:d=9  hl=2 l=   3 prim:          OBJECT            :commonName
   63:d=9  hl=2 l=   7 prim:          PRINTABLESTRING   :CarlRSA
   72:d=6  hl=2 l=  16 prim:       INTEGER           :46346BC7800056BC11D36E2EC410B3B0
   90:d=5  hl=2 l=  11 cons:      SEQUENCE          
   92:d=6  hl=2 l=   9 prim:       OBJECT            :sha256
  103:d=5  hl=2 l=  49 cons:      cont [ 0 ]        
  105:d=6  hl=2 l=  47 cons:       SEQUENCE          
  107:d=7  hl=2 l=   9 prim:        OBJECT            :messageDigest
  118:d=7  hl=2 l=  34 cons:        SET               
  120:d=8  hl=2 l=  32 prim:         OCTET STRING      
      0000 - b1 c2 96 46 35 9c d3 4a-cb a0 b2 7f 9f 94 48 9d   ...F5..J......H.
      0010 - cd 02 b8 99 ff e8 cc aa-9f 99 bc 61 70 89 0a aa   ...........ap...
  154:d=5  hl=2 l=  11 cons:      SEQUENCE          
  156:d=6  hl=2 l=   9 prim:       OBJECT            :sha256WithRSAEncryption
  167:d=5  hl=3 l= 128 prim:      OCTET STRING      
      0000 - 74 5c 95 7b 7e 33 da e4-db a1 a2 d8 79 9a c0 df   t\.{~3......y...
      0010 - 3f c0 39 ba b6 14 32 38-4d a5 68 17 1b c6 9e dd   ?.9...28M.h.....
      0020 - 7a 69 a6 a5 6b 3e 2d a7-54 ee 35 a2 71 e7 90 1c   zi..k>-.T.5.q...
      0030 - a3 91 e0 ff c1 cd f5 03-d5 87 f0 cc c5 42 04 93   .............B..
      0040 - b2 b2 7a e4 9b df 7d 34-d7 90 ea 51 3f 1b 12 1e   ..z...}4...Q?...
      0050 - 7d f0 24 5c 0f fd ab 56-1e 48 7c 60 af e3 47 d6   }.$\...V.H|`..G.
      0060 - d4 2e 15 25 32 fe 63 40-40 e2 c1 55 f9 0d 16 ac   ...%2.c@@..U....
      0070 - c6 ee 8b 77 cc ef f6 ee-7f 4e c7 41 36 2b 9f 1b   ...w.....N.A6+..
localleon commented 3 years ago

In order to verify that p7s file is a pk7s-signature we can use an ASN.1 parser to parse the file. Then you'll simply have to assert that the object identifier at the start of the ASN.1 sequence is of the type pkcs7-signedData (see chapter 14 of RFC 2315 for a definition of this object identifier). This detection method does not proof that the p7s file actually is a PKCS-7 signature, but changes are extremely high.

gabriel-vasile commented 3 years ago

We must consider the time and memory expenses of parsing. If you are concerned parsing is needed in order to prove a file is .p7s, well... most of the existing functions in the library don't prove, but rather say "this file looks like a pdf, png, etc".

Is there a reason why you think using magic numbers from tika would not be a good fit for this case?

localleon commented 3 years ago

Thanks for your input. You are probably right with the computing expenses of parsing.

Most modern files i've seen use Base64 encoding. tika doesn't mention base64 encodings. Theres no "-----BEGIN PKCS7" line or something in the examples from the RFC because the application-type is always defined in the Headers of the MIME File.

We could probably look into finding only the Object Identifier Sequenze for signedData aka pkcs7-signedData in the decoded Data. It looks something like 1 2 840 113549 1 7 2

0 294: SEQUENCE {
4   9:   OBJECT IDENTIFIER signedData (1 2 840 113549 1 7 2)

This should indicate that our file is a p7s file-signature.

I will try and make the magic numbers from tika work.

So we got PEM,DER and Base64 encoding.

PEM is fairly straight forward. We just check if strings.Contains(instring, "-----BEGIN PKCS7") see http://justsolve.archiveteam.org/wiki/PKCS7_certificate DER - working on it Base64 - same as PEM without Headline , can't be detected without decoding the Base64

localleon commented 3 years ago

125 closes this issue.

The PR implements PEM,DER P7S Signatures with Magic Bytes as suggested