Closed Maria-12648430 closed 3 years ago
The current
quoted_printable
has some kind of bug where it strips some trailing(?) whitespaces and tabs. Do you think the new implementation fixes that (soTrim
calls could be removed)?
Yes, indeed ๐ The problem with the current encoder is that it does not fulfill the requirements of (3) of RFC 2045 Section 6.7 fully: while it does encode WSPs that are followed by CRLF, it does not encode them when they are the last character in the body. The new implementation handles that properly.
Nice work!
@seriyps do you want to address the Trim in that one test or shall we accept this as-is?
@mworrell working on an update for the PR, almost done. I'll remove the Trim along with that, it's not needed any more, encoding and decoding will match up ๐
Last commit fixes the line length bug I had in my first implementation, adds some tests for a few edge cases, augments another test to ensure that whitespaces are encoded when last character, and removes the trimming from the proptest.
One last micro-optimization before I leave this in your hands, in choose_transformation
: The current implementation determines the number of allowed ASCII characters by calling length/1
on the result of a filtering list comprehension. I changed this to call byte_size/1
on the result of an equivalent binary comprehension. byte_size
is a constant-time operation, while length
is proportional to the number of list elements.
I am very happy with this. Tests are also looking good.
@seriyps shall we merge?
Yes, looks good!
Merged, thank you @Maria-12648430 !
Impressive ๐ฎ
The current implementation of
encode_quoted_printable/3
is a bit crude (no offense, really ๐), especially when it comes to lines reaching the limit of 76 characters, resorting tostring
functions to retrieve the last line and finding whitespaces in there.The improved version
encode_quoted_printable/6
which I'm proposing here works by remembering the occurences of linear whitespaces (including HTABs), and otherwise only look-aheads in the original data to see if a CRLF follows the current character. I also refactored the encoding of characters itself to work via integer operations only, thereby getting rid of all theio:format
andlists:flatten
calls.I admit that my implementation isn't exactly easy on the eye, but that is because QP-encoding itself is tricky. And IMO, the current implementation isn't that easy to figure out, either ๐
I left the current tests untouched in what they test, only replaced the calls to the helper function
encode_quoted_printable/3
(of which I don't see the point) with calls to the API functionencode_quoted_printable/1
. I also removed one test,"newline craziness"
, of which I also fail to see the point ๐คFinally, I put in a micro-optimization in
choose_transformation/1
which avoids division and rounding and only uses multiplication to figure out if the percentage of printable characters in the sample is above 80.[EDIT]: Just noticed an edge case in which a line could exceed the length limit :laughing: Will fix and update the PR tomorrow.