Open SecT0uch opened 1 year ago
Not sure if it's the same sort of issue but here's a pdf section part with --leave-temp
[<0044> <0065> <0061> <0072> <0009> <0063> <006C> <0069> <006E> <0074> <002C> <0056> <0066> <0079> <0064> <0073> <0075> <006F> <0070> <006D> <007A> <002E> <0054> <0068> <006A> <006B> <0067> <0076>
The text in the pdf is:
"Dear client, We are sorry to inform you that you account is currently frozen you can't (deposit, withdrawal, convert, transfer...) any of your funds until you confirm your account details."
but no text is made available to match on.
Example:
Unfortunately I don't have a VT premium to test your file but this sounds like the right direction. I don't have the same pattern, but will try to look for the string as Hex in other formats
Describe the bug
I've got a PDF sample where clamav is not able to extract the text, while
pdftotext
(https://poppler.freedesktop.org) andpdf2txt.py
(https://pdfminersix.readthedocs.io/en/latest/) can.How to reproduce the problem
clamscan CG.pdf --leave-temps
to keep the normalized files.The document contain the following text: "This document contains protected files"
grep -ri protected /tmp/20230601_094730-CG.pdf.7c648d30d0
returns nothing.Replace this text with the output from the ClamAV command:
Attachments
It is a phishing PDF, containing a link to a malicious website.
File can be found here: https://www.virustotal.com/gui/file/5faeb2ce23c9f86b085ec11733bc711ba1ca410d506e9df0be5f19c2db1730cc