NCSC-NL / taranis3

Taranis
Other
59 stars 17 forks source link

PDF-Attachments in E-Mail-Items broken #39

Closed bsi-lz closed 4 years ago

bsi-lz commented 4 years ago

We got that problem, that we cannot read PDFs that are attached to Emails read in by Email-Sources. The PDFs downloaded in Firefox by 'loadfile/assess/download_attachment/downloadAttachment' contains finally the content: '%PDF-1.5%', got size of 1kb and is considered as broken by okular.

However, we can open and read the parsed and generated PDF-file under

/tmp/systemd-private-[...]-httpd.service-[...]/tmp/taranis/taranis-3.5.1/assess-attachments

and

/tmp/systemd-private-[...]-httpd.service-[...]/tmp/taranis/taranis-3.5.1/display-mail-attachments.

Btw, is that normal behavior to parse and generate the files twice?

When debugging download_attachment.pl after line 44 (sub downloadAttachment) my $attachmentDecoded = decodeMimeEntity( $attachmentEntity, 1, 0 ); $attachmentDecoded prints only '%PDF-1.5%' or '%PDF-1.4%'.

Other file-formats as observed (f.e. .json) parse and download correctly. I could not find a working PDF-file so far.

I attached two files, a working PDF copied from tmp-dir and the same PDF as downloaded by firefox through taranis. If you need all debug-prints / item-contents as db-dumps let me know.

IT-ISAC_Open_Source_News_June_26_2018-DlByFirefoxTaranis.pdf IT-ISAC_Open_Source_News_June_26_2018.pdf

markov2 commented 4 years ago

We got that problem, that we cannot read PDFs that are attached to Emails read in by Email-Sources. The PDFs downloaded in Firefox by 'loadfile/assess/download_attachment/downloadAttachment' contains finally the content: '%PDF-1.5%', got size of 1kb and is considered as broken by okular.

Do you refer to items which were pulled-in via IMAP from remote folders? Probably.

The two files you see being created originate in the insert process. Perl module MIME::Entity is used, which has an ancient origin, before people really used attachments a lot. It uses files to temporarily store attachment data... when you then collect the attachment yourself, you have to copy it again... Nothing to worry about.

Can you please email me the inserted message including all headers? (as tar or zip) I suspect it is an transfer-encoding issue.

bsi-lz commented 4 years ago

Do you refer to items which were pulled-in via IMAP from remote folders? Probably.

exactly

Can you please email me the inserted message including all headers? (as tar or zip) I suspect it is an transfer-encoding issue.

you've got mail

markov2 commented 4 years ago

Actually, the imported mails only supported text/* type attachments, which is not filtered correctly: when I saw it first, I flagged is as a probable bug but was unsure whether it would kill any procedure by fixiing the bug. See:

return " " if $noAttachment && $contentType !~ m!^text/!;    #XXX wrong

The probably intention was:

return " " if $noAttachment || $contentType !~ m!^text/!i;

You do not want to restrict yourself to this stupid restriction, but bump into a lack of support for non-text types in the few lines below: application/pdf has to be treated as binary. Therefore, this fails:

my $string = eval { decode($charset, $bodyContent, Encode::FB_WARN) };

FB_WARN does not only warn while decoding the second line in the PDF, but also stops processing the bodyContent. Only the first "line" is returned. FB_DEFAULT would have simply continued... probably causing more damage to the file.

When the attachment is not text, decoding should not happen. So, what about:

diff --git a/pm/Taranis.pm b/pm/Taranis.pm
index 061a2e5f..e223f09d 100644
--- a/pm/Taranis.pm
+++ b/pm/Taranis.pm
@@ -239,25 +239,18 @@ sub decodeMimeEntity($$$) {
        my $contentTransferEncoding = lc($head->mime_encoding() || '');

        return " " if $noHtmlPart   && $contentType eq 'text/html';
-       return " " if $noAttachment && $contentType !~ m!^text/!;    #XXX wrong
+       return " " if $noAttachment;

-       my $bodyContent = $entity->stringify_body();
-
-       # remove transfer encodings
-       if ( $contentTransferEncoding eq 'base64') {
-               $bodyContent = decode_base64( $bodyContent );
-       } elsif ($contentTransferEncoding eq 'quoted-printable') {
-               $bodyContent = decode_qp( $bodyContent );
-       }
+       my $bodyContent = $entity->bodyhandle->as_string;
+       return $bodyContent if $contentType !~ m!^text/!;

        # Convert to Perl string, can die on unsupported charset
-       my $string = eval { decode($charset, $bodyContent, Encode::FB_WARN) };
+       my $string = eval { decode($charset, $bodyContent, Encode::FB_QUIET) };
        if($@) {
                print "decodeMimeEntity error: $@\n";
                return $bodyContent;   # work with bytes
        }
-
-       return $string;            # clean Perl string
+       $string;
 }
bsi-lz commented 4 years ago

I tested your offered solution:

my solution:

diff --git pm/Taranis.pm pm/Taranis.pm
index a9c63f1..7f66a8b 100644
--- pm/Taranis.pm
+++ pm/Taranis.pm
@@ -244,40 +244,42 @@ sub decodeMimeEntity($$$) {

        my $head        = $entity->head();
        my $contentType = lc($head->mime_type() || 'text/plain');
        my $charset     = $head->mime_attr("content-type.charset")  || 'us-ascii';
        my $contentTransferEncoding = lc($head->mime_encoding() || '');

        return " " if $noHtmlPart   && $contentType eq 'text/html';
        return " " if $noAttachment && $contentType !~ m!^text/!;    #XXX wrong

        my $bodyContent = $entity->stringify_body();

        # remove transfer encodings
        if ( $contentTransferEncoding eq 'base64') {
                $bodyContent = decode_base64( $bodyContent );
        } elsif ($contentTransferEncoding eq 'quoted-printable') {
                $bodyContent = decode_qp( $bodyContent );
        }

+       return $bodyContent if $contentType !~ m!^text/!; #f.e. PDFs
+
        # Convert to Perl string, can die on unsupported charset
        my $string = eval { decode($charset, $bodyContent, Encode::FB_WARN) };
        if($@) {
                print "decodeMimeEntity error: $@\n";
                return $bodyContent;   # work with bytes
        }

        return $string;            # clean Perl string
 }
markov2 commented 4 years ago

My lack of knowledge about MIME::Entity. I do maintain the modules which are used in its core (Mail::Internet), but have my own (much stronger) library Mail::Box for the real work.

MIME::Entity apparently does not see the preamble as part of the message body object. That was the original idea of MIME. I have restored the use of stringify_body: back to a minor change. -- Regards, MarkOv


   Mark Overmeer MSc                                MARKOV Solutions
   Mark@Overmeer.net                          solutions@overmeer.net

http://Mark.Overmeer.net http://solutions.overmeer.net

markov2 commented 4 years ago

Included in 3.7.4, just released