DEVSENSE / Phalanger

PHP 5.4 compiler for .NET/Mono frameworks. Predecessor to the opensource PeachPie project (www.peachpie.io).
http://v4.php-compiler.net/
Apache License 2.0
382 stars 94 forks source link

Implementation of 'htmlspecialchars' is not complete #69

Open linsmod opened 7 years ago

linsmod commented 7 years ago

the htmlspecialchars will not translate any word in this list "&", """, "'", "<", ">"

but HtmlSpecialCharsEncode does not implements the logic, so if the input contains any words in the list above, the output will be unexpected. eg. & will be translated to &

found the issue when woking with wordpress 4.6.1

linsmod commented 7 years ago

And the function should not translate symbol & to & A M P when its not an attribute. this is my personal fix ` internal static string HtmlSpecialCharsEncode(string str, int index, int length, QuoteStyle quoteStyle, string charSet) {

        if (str == null) return String.Empty;

        Debug.Assert(index + length <= str.Length);

        StringBuilder result = new StringBuilder(length);

        // quote style is anded to emulate PHP behavior (any value is allowed):
        string single_quote = (quoteStyle & QuoteStyle.SingleQuotes) != 0 ? "&#039;" : "'";
        string double_quote = (quoteStyle & QuoteStyle.DoubleQuotes) != 0 ? "&quot;" : "\"";
        var strArray = new string(str.Skip(index).Take(length).ToArray()).Split('&');
        var strList = new List<string>();
        foreach (var item in strArray)
        {
            for (int i = 0; i < item.Length; i++)
            {
                char c = item[i];
                switch (c)
                {
                    case '&':
                        result.Append("&amp;"); break;
                    case '"':
                        result.Append(double_quote); break;
                    case '\'':
                        result.Append(single_quote); break;
                    case '<':
                        result.Append("&lt;"); break;
                    case '>':
                        result.Append("&gt;"); break;
                    default:
                        result.Append(c); break;
                }
            }
            strList.Add(result.ToString());
            result.Clear();
        }
        return string.Join("&", strList);
    }`
lucyllewy commented 7 years ago

Does the behaviour of Phalanger differ from PHP?

It appears you are stating that Phalanger escapes &amp; to &amp;amp; when run through htmlspecialchars(). If that is your intended message, then this behaviour is consistent with the PHP implementation and thus is not a bug and will not be changed.

jakubmisek commented 7 years ago

Right, if you have a small test case in PHP, please try it with Phalanger and legacy PHP first whether it differs.

linsmod commented 7 years ago

code: echo 'quote_style:'.$quote_style; echo 'charset:'.$charset; echo 'double_encode:'.$double_encode; echo $string; echo htmlspecialchars($string); $string = @htmlspecialchars( $string, $quote_style, $charset, $double_encode ); die($string);

offical php output: quote_style:3charset:UTF-8double_encode:http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1

Phalanger output: quote_style:3charset:UTF-8double_encode:http://localhost:8000/wp-admin/load-styles.php?c=0&amp;dir=ltr&amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1http://localhost:8000/wp-admin/load-styles.php?c=0&amp;amp;dir=ltr&amp;amp;load%5B%5D=dashicons,buttons,forms,l10n,login&amp;amp;ver=4.6.1

linsmod commented 7 years ago

when invoke htmlspecialchars( $string ) twice, the official php and Phalanger get the same result however if the second invokation with parameters htmlspecialchars( $string ) htmlspecialchars( $string, $quote_style, $charset, $double_encode );

the test results are different.

the official php seems like fixed &amp;amp; but Phalanger not.

lucyllewy commented 7 years ago

Please can you tidy your test case to separate the outputs so that we can see what is output by which part of the test. It is still not clear what you are actually stating is the problem, i.e. which circumstance causes the issue you perceive. As an example, it is not clear what value $double_encode has in your example: is it true, false, empty string, null, ....?

linsmod commented 7 years ago

known little about php, how can i know the $double_encode is ture/false, or empty string or null or someting else?

hlizard commented 7 years ago

I also meet this problem when try wp4.6.1 with Phalanger. https://github.com/hlizard/WpDotNet/commit/72a0595beac77b84d5493c9a37796b674e07bca4

居然是同胞,我也是不懂php,太巧了