fmbiete / Z-Push-contrib

Z-Push fork with changes that I will try to contrib
GNU Affero General Public License v3.0
134 stars 62 forks source link

Fix charset conversion of binary data for PHP < 5.4 #202

Closed RobinMcCorkell closed 9 years ago

RobinMcCorkell commented 9 years ago

There is an issue with the handling of charsets in include/mimeDecode.php, specifically _autoconvert_encoding(). The function uses iconv() to convert data (in this case, a binary PDF) to the required charset, but binary characters often cannot be converted. However, the behaviour of iconv() changed in PHP 5.4.0, and this code only works correctly in newer versions.

From the man page:

PHP 5.4.0: Since this version, the function returns FALSE on illegal characters, unless //IGNORE is specified in output charset. Before, it returned partial output string.

This is leveraged in _autoconvert_encoding() by a check, which returns the original string in case iconv() returns a non-string/empty string. However, pre-5.4, iconv() will just return as much as it could decode, which passed the checks and resulted in truncated attachments coming back.

This patch checks the multi-byte string length of the converted string, and if it does not match, resets the string to the original. It also handles false being returned for newer versions.

fmbiete commented 9 years ago

Thanks @Xenopathic !!!

I have merged it, but moved the check outside the try/catch, because iconv can throw an exception if iconv fails.

RobinMcCorkell commented 9 years ago

@fmbiete Good point - glad I could help!

fortiko commented 9 years ago

This creates several error messages of the following kind for me.

Z-Push-contrib/include/mimeDecode.php:894 mb_strlen(): Unknown encoding "" (2)
RobinMcCorkell commented 9 years ago

@fortiko That is a more fundamental problem with $supposed_charset being set to the empty string (which should break other functions - you are lucky it hasn't blown up before now). I think it'd be best to open it as a new issue.

Relevant lines:

fmbiete commented 9 years ago

@fortiko Can you add extre debug under this line in include/mimeDecode.php?

ZLog::Write(LOGLEVEL_DEBUG, "Mail_mimeDecode()::_autoconvert_encoding(): Text cannot be correctly decoded, using original text. This will be ok if the part is not text, otherwise expect encoding errors");
ZLog::Write(LOGLEVEL_DEBUG, sprintf("$this->_charset: '%s'", $this->_charset);
ZLog::Write(LOGLEVEL_DEBUG, sprintf("$detected_encoding: '%s'", implode(",", $detected_encoding);

Event though $supposed_charset could be empty (if not charset is set in a part), $detected_encoding and $this->_charset should always have value.

fortiko commented 9 years ago

I added the debug lines and will report back once I have collected some errors!

norm2k commented 9 years ago

Here are my findings: 1) the additional debug code should read (lines 896 and 897):

ZLog::Write(LOGLEVEL_DEBUG, sprintf("$this->_charset: '%s'", $this->_charset));
ZLog::Write(LOGLEVEL_DEBUG, sprintf("$detected_encoding: '%s'", implode(",", $detected_encoding)));

2) My log output looks like:

04/07/2015 01:08:53 [26982] [DEBUG] [xxx@xxx] Mail_mimeDecode()::_autoconvert_encoding(): Text cannot be correctly decoded, using original text. This will be ok if the part is not text, otherwise expect encoding errors
04/07/2015 01:08:53 [26982] [DEBUG] [xxx@xxx] utf-8: 'utf-8'
04/07/2015 01:08:53 [26982] [ WARN] [xxx@xxx] /usr/share/z-push-master/include/mimeDecode.php:897 implode(): Invalid arguments passed (2)
04/07/2015 01:08:53 [26982] [DEBUG] [xxx@xxx] : ''

Looks like $detected_encoding could in fact be empty although I am not sure why.

RobinMcCorkell commented 9 years ago

@norm2k Could you try putting the following in a PHP file then running it through the command line?

<?php
try {
    $foo = 'bar';
} catch (\Exception $e) {
}
echo $foo . "\n";

Variable scope is per-function in PHP, not per-block like C or related languages, so the above example (and by extension the _autoconvert_encoding() code) should work. Some references I've found on the net seem to suggest that certain versions of PHP will fail on that and hide the variable...

norm2k commented 9 years ago

I get bar as the output. FYI I am on PHP 5.6.10 FPM

RobinMcCorkell commented 9 years ago

That's the expected output. I'm at a loss - it looks as though the $detected_charset variable is either set to the empty string/null (we need to change the if check) or it is being garbage collected before being used outside the try catch (eek!)

fmbiete commented 9 years ago

@norm2k Could you test with the referenced commit, please? I believe that mb_detect_encoding can return an empty string in some cases (undocumented?). But in any case we need to set $detected_charset variable to same sane value.

norm2k commented 9 years ago

I have pulled the latest and verified changes were present added in the debug code but I still see in my logs:

05/07/2015 00:40:29 [25553] [DEBUG] [xxxxxx@xxxxxx] Mail_mimeDecode()::_autoconvert_encoding(): Text cannot be correctly decoded, using original text. This will be ok if the part is not text, otherwise expect encoding errors
05/07/2015 00:40:29 [25553] [DEBUG] [xxxxxx@xxxxxx] utf-8: 'utf-8'
05/07/2015 00:40:29 [25553] [ WARN] [xxxxxx@xxxxxx] /usr/share/z-push-master/include/mimeDecode.php:905 implode(): Invalid arguments passed (2)
05/07/2015 00:40:29 [25553] [DEBUG] [xxxxxx@xxxxxx] : ''
RobinMcCorkell commented 9 years ago

@norm2k Could you try putting the following at the same place as the other debugging lines?

ZLog::Write(LOGLEVEL_DEBUG, sprintf('$detected_encoding: "%s"', var_export($detected_encoding, true)));

That will print more extensive debug output for the $detected_encoding variable, including its type. Feel free to remove the other lines, we have verified they are what they are supposed to be.

norm2k commented 9 years ago

I added the extra debug and here is my output:

06/07/2015 11:56:18 [10189] [DEBUG] [xxxxxx@xxxxxx] Mail_mimeDecode()::_autoconvert_encoding(): Text cannot be correctly decoded, using original text. This will be ok if the part is not text, otherwise expect encoding errors
06/07/2015 11:56:18 [10189] [DEBUG] [xxxxxx@xxxxxx] $detected_encoding: "''"
RobinMcCorkell commented 9 years ago

Hmm, so it is genuinely an empty string. Did you have any luck with the patch @fmbiete linked to?

norm2k commented 9 years ago

I should have mentioned that the tests I performed were with the linked changes. Still comes up empty.

fmbiete commented 9 years ago

@norm2k Can you try again with that commit? Thanks!!

If you can reproduce the error, can you send me a message that it's giving the error and your locales config (the output of "locale" in your server)?