YahnisElsts / wp-update-server

A custom update API for WordPress plugins and themes. Intended to be used in conjunction with my plugin-update-checker library.
MIT License
824 stars 176 forks source link

Escape log data before writing to file #66

Closed dangoodman closed 6 years ago

dangoodman commented 6 years ago

This escapes special chars in logged data to avoid breaking the log file with something like ?installed_version=%0D%0A%09+ in url. The purpose of the added method escapeLogInfo() is similar to mysql[_real]_escape_string(), htmlspecialchars(), escapeshellarg(), etc.

YahnisElsts commented 6 years ago

This seems like a good idea, but I think it doesn't go far enough. May I suggest something like this for escaping column values:

function escapeLogValue($input) {
    $regex = '/[[:^graph:]]/';

    //preg_replace_callback will return NULL if the input contains invalid Unicode sequences,
    //so only enable the Unicode flag if the input encoding looks valid.
    if ( mb_check_encoding($input, 'UTF-8') ) {
        $regex = $regex . 'u';
    }

    $result = str_replace('\\', '\\\\', $input);
    $result = preg_replace_callback(
        $regex, 
        function($matches) {
            $length = strlen($matches[0]);
            $escaped = '';
            for($i = 0; $i < $length; $i++) {
                //Convert the character to a hexadecimal escape sequence.
                $hexCode = dechex(ord($matches[0][$i]));
                $escaped .= '\x' . strtoupper(str_pad($hexCode, 2, '0', STR_PAD_LEFT));
            }
            return $escaped;
        },
        $result
    );
    return $result;
}

This should escape not only spaces and line breaks but also all non-printable characters, NULL-bytes, and invalid Unicode sequences.

As for padding, I think it would be better to just move the code that adds padding so that it runs after filterLogInfo/escapeLogInfo. Adding, removing and then re-adding padding seems inelegant.

dangoodman commented 6 years ago

I was looking to the [:print:] character class but then skipped it since it works differently on windows and linux. Namely, on windows the Tab (\t) character is considered as printable, while on linux it's non-printable.

The [:graph:] class doesn't seem to have that issue as per my tests. Looks good to me.

As of the unicode-related stuff, to be honest my brain isn't big enough to keep all the details about it required to handle it correctly and safely. All those invalid sequences, 5 and 6 byte length chars, normalization forms, etc. So it's ok to me if it's ok to you ;)

Agree about re-adding padding. My intention was to keep escaping the last step in the chain to make sure no breaking chars pass by. If we add a protected method, say, formatLogIngo(), supposed to pad fields or decorate them in other way, somebody will forget about the requirement to produce output-safe values.

YahnisElsts commented 6 years ago

I don't know much about Unicode either. That's why I'm relying on PCRE's Unicode support and mbstring instead of trying to implement more advanced validation. Presumably/hopefully, the people who wrote those extensions know this stuff better than we do.

(That reminds me: we should probably check if mb_check_encoding exists before calling it because mbstring is apparently a non-default extension.)

If we add a protected method, say, formatLogIngo(), supposed to pad fields or decorate them in other way, somebody will forget about the requirement to produce output-safe values.

Hmm, I suppose that's possible. I would keep the padding code in logRequest for now and just move it down. For example (untested code):

$columns = $this->filterLogInfo($columns, $request);
$columns = $this->escapeLogInfo($columns);

if ( isset($columns['ip'] ) {
    $columns['ip'] = str_pad($columns['ip'], 15, ' ');
}
if ( isset($columns['http_method'] ) {
    $columns['http_method'] = str_pad($columns['http_method'], 4, ' ');
}
dangoodman commented 6 years ago

Here it is.