[Bug] Plugin chooses wrong decryption key

Leseratte10 commented 3 years ago

CheckList

[x] The Title and The Log Title are setted correctly.
[x] Clarified about my environment.
[x] Code block is used for the log.

[Bug] Plugin chooses wrong decryption key for PDF

I'm adding a PDF to Calibre. The modifications in #1689 are included so PDFs even work. I have multiple different ADE keys imported into the plugin, with the "correct" one for this PDF being the last one in that list.

However, the plugin seems to think that the very first key it tries to use is correct. Probably, just by chance, decryption with that wrong key leads to a file that at least looks like it could be a valid PDF. Though, actually, it results in a PDF with all pages empty.

When I delete all the other keys from the plugin settings so only the correct key is left, the plugin correctly decrypts the book.

My Environment

Calibre: 5.28

Kindle: N/A

DeDRM: 7.2.1 plus PR #1689

Log

Log Title

```log DeDRM v7.0.3: Trying to decrypt mzgibmyn.pdf DeDRM v7.0.3: mzgibmyn.pdf is a PDF ebook DeDRM v7.0.3: Trying Encryption key key1 DeDRM v7.0.3: Finished after 0.8 seconds Traceback (most recent call last): File "calibre/customize/ui.py", line 433, in get_file_type_metadata File "calibre/customize/builtins.py", line 318, in get_metadata File "calibre/ebooks/metadata/pdf.py", line 126, in get_metadata ValueError: Could not read info dict from PDF pdfinfo returned no UTF-8 data Added Book to db in: 0.8 Added 1 books in 22.0 seconds ``` (That "key1" that it tried to use is definitely, 100%, the wrong key for that book).

Leseratte10 commented 3 years ago

Addition: Just had the same bug happen with an EPUB file, so the bug is not limited to just PDFs.

Leseratte10 commented 3 years ago

Okay, due to the questionable legality of tools like this I do not want to make a PR, but here's what I think is going wrong: Right now, both ineptepub and ineptpdf (PDF inside the "initialize_ebx" function, EPUB inside the "decryptBook" function) assume that if the 17th-to-last byte of the decrypted book key is a NULL byte, then the following 16 bytes (the rest of the data) must be the decrypted book key.

(Btw., does anyone know what the heck all that data before that NULL byte is?)

However, if you take that encrypted book key, and decrypt it with a random, non-matching key for a different adobe account, it will lead to 128 random bytes instead due to the wrong key. If the 17th-to-last byte of these 128 random bytes happens to be a NULL byte, too - which probably has a chance of 1/256 - the plugin assumes it found the correct book key and then continues to decrypt the book with the wrong key. Of course, the more "wrong" user keys you have imported into the plugin settings, the higher is the chance of running into this issue.

Without knowing what all the data before the book key is, there's not really a way to completely prevent that. However, I noticed that that data seems to always start with "00 02" for EPUBs and with "02" for PDFs. This means that by modifying the code to not only check for that NULL byte in the 17th-to-last place but also checking for the one (or two) bytes in the beginning, you decrease the chance for issues from 1/256 to 1/(256^2) = 1/65535 for PDF files, or down to 1/(256^3) = 1/16777216 for EPUB files, which should be way more reliable.

EDIT: An ideal solution would probably be to not only store the raw encryption keys in the dedrm.json but also the Adobe user UUID for that key. That way the plugin could check the rights.xml for the UUID the book is licensed for, and then directly pick the correct key from the list instead of having to try them all. Though that would not work when the user manually imports an existing DER file as that doesn't contain the UUID ... Maybe one would need statistics / heuristics like "Okay, for different books all with UUID X we've successfully used key A 100 times and key B one single time, so key A is probably the correct one (for the future) and B was such an 1/256 error".

j-howell commented 3 years ago

(Btw., does anyone know what the heck all that data before that NULL byte is?)

Those extra bytes are the result of RSAES-PKCS1-v1_5 encryption.

The decrypted string should be start with 00 02, followed by at least eight randomly chosen non-zero bytes, followed by a 00 byte, and then the 16 byte key. A more rigorous verification could look at all of those bytes prior to the key.

Leseratte10 commented 3 years ago

Yeah, the decrypted string starts with 00 02 for EPUBs (or just 02 for PDFs apparently), and then 109 bytes of data that doesn't seem to have a pattern. According to your link all these 109 bytes would then be junk data / padding data without a purpose. After that there's the NULL byte and then the key. Thanks for that link, I did google for PKCS1 1.5 (as mentioned in the comment in the source code) but failed to figure out that all the other data is just padding.

Verifying these 00 02 bytes (EPUB) or the 02 byte (PDF) would bring the chance of errors from 1/256 to 1/(256*256) for PDFs and down to 1/(256*256*256) for EPUBs, which is what I talked about in the previous comment. Not 100% foolproof, but way better than the current implementation.

With "more rigorous" you mean verifying that none of the junk bytes are zero? ~~I believe that was included in the past but has been removed because there were some buggy books that did have a zero in that padding data.~~ EDIT: Apparently I was wrong about that.

j-howell commented 3 years ago

With "more rigorous" you mean verifying that none of the junk bytes are zero? I believe that was included in the past but has been removed because there were some buggy books that did have a zero in that padding data.

That is what I meant. Perhaps that check could be reinstated and only be relaxed if none of the possible results pass inspection.

apprenticeharper / DeDRM_tools