Open deiferni opened 5 years ago
here a branch where I added a test mail with CRLF line endings: https://github.com/4teamwork/opengever.core/tree/nj_zip_crlf
I can confirm that the line endings get modified when writing the zip, but not when downloading the file from GEVER. Not sure yet why that happens.
Ha, it actually is the other way around it seems. When downloading a file, the line endings get rewritten, but not when exporting as ZIP. In that case, line endings are preserved.
Is it the browser being helpful versus its idea of your filesystem?
For the download probably. this is directly a stream response, with filename and such in the header. For the zipexport probably not, there we explicitely write the files
Verify the browser making the difference against output from curl
into a file.
Original ticket with attached E-mails and such: https://extranet.4teamwork.ch/support/gever-ai/tracker/318
Ok so the issue comes from mail-in. Mail-in transforms CRLF to LF. Then zip writes the file as is, whereas download transforms back to CRLF (also via curl). So basically they have a problem with the zip export because the zip export does it right :-).
So the magic happens here:
https://github.com/4teamwork/ftw.mail/blob/master/ftw/mail/inbound.py#L57
This then calls python's email.message.Message.as_string
, then email.generator.Generator
which uses \n
as line separators.
Also not that on my machine the E-mail still works fine, even after the line endings have been changed. I'm not too sure whether we want to fix this and how.
Proposed solution: We could transform line endings during zip export.
We should check if there is an RFC that takes a stance on what line separators shall be used. Or if both are valid. Also rewriting the line separator during download may mess with maximum line length limitations as they add an additional character. That may cause trouble further on.
It seems that CRLF is the standard for E-mail bodies (https://tools.ietf.org/html/rfc2822). Actually python's mail does not touch those, so they come out right, but it does change the line endings between header lines and parts, using \n
there. It seems that this used to be acceptable but is not anymore.
Here some relevant parts from the standard:
Messages are divided into lines of characters. A line is a series of
characters that is delimited with the two characters carriage-return
and line-feed; that is, the carriage return (CR) character (ASCII
value 13) followed immediately by the line feed (LF) character (ASCII
value 10). (The carriage-return/line-feed pair is usually written in
this document as "CRLF".)
A message consists of header fields (collectively called "the header
of the message") followed, optionally, by a body. The header is a
sequence of lines of characters with special syntax as defined in
this standard. The body is simply a sequence of characters that
follows the header and is separated from the header by an empty line
(i.e., a line with nothing preceding the CRLF).
2.3. Body
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
- CR and LF MUST only occur together as CRLF; they MUST NOT appear
independently in the body.
- Lines of characters in the body MUST be limited to 998 characters,
and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other standards documents,
specifically the MIME documents [RFC2045, RFC2046, RFC2048, RFC2049]
that extend this standard to allow for different sorts of message
bodies. Again, these mechanisms are beyond the scope of this
document.
4. Obsolete Syntax
Finally, certain characters that were formerly allowed in messages
appear in this section. The NUL character (ASCII value 0) was once
allowed, but is no longer for compatibility reasons. CR and LF were
allowed to appear in messages other than as CRLF; this use is also
shown here.
In the mail download we manually convert the line endings, see https://github.com/4teamwork/opengever.core/blob/7390b2e4afd830f31cec55285d91bdf490b041fa/opengever/mail/browser/download.py#L17 or https://github.com/4teamwork/opengever.core/pull/638. So my suggestion would be to do the same for zip-export.
Harvest:
OneGov GEVER
-Softwarepflege (Refactoring/Bug)
When exporting eml files in a Zip the line endings seem to be rewritten. I suspect we don't always write in binary mode 🤔.
See also https://extranet.4teamwork.ch/support/gever-ai/tracker/330