Cisco-Talos / clamav

ClamAV - Documentation is here: https://docs.clamav.net
https://www.clamav.net/
GNU General Public License v2.0
4.43k stars 706 forks source link

pdf with "pseudo" encryption #1196

Open JAF84 opened 8 months ago

JAF84 commented 8 months ago

hello,

tested also with the latest release (1.3.0)

see attached PDF as sample, i have a lot of samples like this.

52n31op9ob2on.pdf

in this case the PDF is encrypted, but does not ask for a password an all images are visable by any pdf-viewer, so some object are encrypted, but no special password is necessary to decrypt.

clamav is extracting every object of the PDF, but they are still encrypted, to useless to find anything usefill inside. you can see the object with "clamscan --debug --leave-temps=yes --tempdir=1.tmp ..."

so of course clamav should also decrypt this files in order to scan the parts...

br johannes

JAF84 commented 8 months ago

I belive this object means, that the KEY is saved in the PDF file...

36 0 obj << /CF <</StdCF <</CFM /AESV2 /Length 16 /Type /CryptFilter>>>> /EncryptMetadata false /Filter /Standard /Length 128 /O <12C8E19723067F3F573A569162793847A399164D0ABD07C378E264D04385DE6C> /P -3904 /R 4 /StmF /StdCF /StrF /StdCF /U <8DB1952FBC37B941D71F1E81F508A629A50EDB71F2423300B31F50D70AF2A721> /V 4

endobj xref

ragusaa commented 8 months ago

Hi,

Thank you for the notifying us about this, and I am sorry for the delay in responding to you.

In looking at our metadata, this file is recognizing that there is an encrypted image that is decryptable, but appears to be being extracted without being decrypted.

According to pdfimages, this image is of type portable pixmap (ppm).

I am opening a ticket internally to track this issue, and get it scheduled for the future. We'll udpate this issue when it is scheduled.

If you could provide some of your other samples, we would appreciate it.

Thanks, Andy

JAF84 commented 8 months ago

hallo andy,

attached some more samples.

ce2kg7bptpo7e.pdf v554fqz6krwme.pdf 7a2ljrwiskbk.pdf eezo7xs89c.pdf zikjqtdw51x5uuk.pdf

this are of course unwanted spam-pdf.

But there are also serious PDFs, which has this "pseudo-encrytion", so using this "pdf-feature" does not globaly mean, that the PDF is bad one...

br johannes

JAF84 commented 8 months ago

hello,

now i can tell you more about this, encryption is done when you protect the PDF e.g. for not-printable.

see samples attached

1.pdf 1.pdf => without encryption 2.pdf 2.pdf => encryption, but no password necessary => so could/should be checked... 3.pdf 3.pdf => encryption, password necessary

can be easiely created with pdftk on linux: pdftk 1.pdf output 2.pdf owner_pw 1234 pdftk 1.pdf output 3.pdf owner_pw 1234 user_pw 4321

br johannes

ragusaa commented 8 months ago

That's great, thank you for the samples, and instructions on where this is coming from. We have some other pdf tasks planned, so hopefully we can get this addressed as part of that work.

Thanks, Andy

JAF84 commented 8 months ago

btw: also very interesting is that:

clamscan.exe --alert-encrypted=yes *pdf

1.pdf: OK 2.pdf: OK 3.pdf: Heuristics.Encrypted.PDF FOUND

so clamav already detects a difference between 2+3.pdf...

ragusaa commented 8 months ago

I haven't had a chance to play with the new files yet, but I would imagine 3.pdf would not have 'decrpytable' in the json output.

ragusaa commented 8 months ago

Just checked. 2.pdf is decryptable, 3.pdf is not.

JAF84 commented 8 months ago

hello Andy,

i now also checked, clamav 1.30 when i do 2.pdf you are right, it shows "pdf_find_and_extract_objs: encrypted pdf found, decryptable!"

LibClamAV debug: cli_pdf: U: : a95f5a7083f9fb99bb158fcd70e503db00000000000000000000000000000000 LibClamAV debug: cli_pdf: O: : dd027d75bab3642ffd6d1b4a2020e2df0022ff603ae18bfb6769f36dd5800bfa LibClamAV debug: check_owner_password: Unknown or unsupported encryption version. R: 3 LibClamAV debug: check_owner_password: encrypted PDF found but cannot decrypt with empty owner password LibClamAV debug: cli_pdf: U: : a95f5a7083f9fb99bb158fcd70e503db00000000000000000000000000000000 LibClamAV debug: cli_pdf: O: : dd027d75bab3642ffd6d1b4a2020e2df0022ff603ae18bfb6769f36dd5800bfa LibClamAV debug: cli_pdf: md5: f57ac02ebae3c6f4fd80ca480c0db974 LibClamAV debug: cli_pdf: Candidate encryption key: f57ac02ebae3c6f4fd80ca480c0db974 LibClamAV debug: cli_pdf: fileID: 2bc8cb8f258e5c34c306e9bdf5ac31e7 LibClamAV debug: cli_pdf: computed U (R>=3): a95f5a7083f9fb99bb158fcd70e503db LibClamAV debug: check_user_password: user password is empty LibClamAV debug: pdf_find_and_extract_objs: encrypted pdf found, decryptable! LibClamAV debug: Bytecode executing hook id 258 (0 hooks) LibClamAV debug: Bytecode: no logical signature matched, no bytecode executed LibClamAV debug: pdf_find_and_extract_objs: (parsed hooks) returned 0

when i do --leave-temps=yes with 1.pdf there i see the "hello world" object in the tempfiles.

but with 2.pdf the extractred tempfiles are all still encrypted ... and so useless. so it's not possible to create signatures of the PDF-parts...

i've now also tested and PDF with an image. i used leave-temps to get the image file and created a hash-based signature of if.

the unencrypted file was marked as infected after that. then i used " pdftk 1.pdf output 2.pdf owner_pw 1234" to encrypt.

clamav was telling me "decyptable", but did not mark the file as infected.

so maybe clamav is maybe able the deccypt it, but does not use the unencrypted parts for some reason?

br johannes

ragusaa commented 8 months ago

I think the 'LibClamAV debug: check_owner_password: Unknown or unsupported encryption version. R: 3' is the problem. When that statement is printed in our pdf parser, it does not attempt to decrypt that block, but the decryptable flag is printed because we should be able to decrypt.

We have some other planned work to do on the pdf parser, so hopefully we can get this implemented as part of that.

Thank you for digging into this!

JAF84 commented 8 months ago

i have a lot of differnt samples here, but a lot of them contains bussiness data, which i cannot post here.

but if there is some beta version to test, please let me know...

well anyway if the file has an encryption, like 2.pdf, also if this only "for printing deny" and if clamav fails to decrypt => then it should also be marked as "Heuristics.Encrypted.PDF" or something simular...

because maybe there are also other encryptions, which clamav fails to decrypt or maybe in the future the will be a new way to encrypt pdf files...

can you also think about this?

JAF84 commented 8 months ago

btw: if you think this is the problem: "Unknown or unsupported encryption version. R: 3"

this should be fixable easily, if revision 4 is already working? see pdf pdfreference1 https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf page 125+126.

there are some jobs aditional jobs do to if revision is 3+ or revision is 4+...

br johannes

ragusaa commented 8 months ago

Unfortunately, we have a few other high-priority tasks that we need to address before we can get started on this. There is some other PDF work we need to do, so we plan on fixing this as part of that work.

I'll definitely let you know when there is something to test on your other samples.

Andy

JAF84 commented 3 months ago

hello ragusaa,

any now here?

br johannes

ragusaa commented 3 months ago

Hi Br Johannes,

Unfortunately not.

Sorry, Andy

JAF84 commented 1 month ago

hello,

https://github.com/Cisco-Talos/clamav/issues/770

this is the same issue br johannes