Kunena / Kunena-Forum

Kunena Forum - Forum / Bulletin Board / Discussions component for Joomla - This is the 6.x/5.x main development branch. Please do not open issues regarding earlier versions of Kunena
https://www.kunena.org
GNU General Public License v3.0
1.75k stars 814 forks source link

Single quote in content breaks message display #9673

Closed aidanwhiteley closed 2 months ago

aidanwhiteley commented 2 months ago

Describe the bug When a single quote is inserted in the content of a message it breaks the display of the message. The rendered HTML is "broken" in that the single quote in the content of the meta name="description" field is replaced by a unicode character (see screen grabs) and then Kunena CSS.

If the single quote is removed from the xxx_kunena_topics.first_post_message field, the HTML is rendered / displayed corerctly. For info, the single quote can be left in place in the corresponding field xxx_kunena_messages_text.message field and the HTML still displays OK.

To Reproduce Steps to reproduce the behavior:

  1. Go to New Topic
  2. Fill in the Category and Subject
  3. Then in the Message editor enter any text and include a single quote (apostrophe) in the text somewhere
  4. Click Submit and then view the posted message. It will be badly broken in terms of display (looking as though it cant find some CSS files - but this isn't the actual problem).

Expected behavior For messages to display OK

Actual result The page display is broken (see screen grab)

Screenshots html-src-without-quote Screen grab of HTML source for a correctly displaying post

html-src-with-quote Screen grab for HTML source for identical post except that it includes a single quote in the content

table-data Screen grab of data for the above two posts in the xxx_kunena_topics table

table-definition Table definition for the xxx_kunena_topics table showing the fields character set and collation

disply-broken-html Screen grab of a bit of the broken rendered HTML

System information (please complete the following information)

Joomla version: 4.4.3 Kunena version: 6.2.5 Php version: 8.0.29 Database version: MySQL 5.7.42

Desktop (please complete the following information):

Smartphone (please complete the following information): Not tested on phones - expect the same behaviour

Additional context In the screen shots, its shows the rendered HTML - both for a post with a single quote and an identical post without a single quote. For the post with a single quote, there is some unicode character(s) displayed where the meta name=description field is broken.

Given that this looks like a character encoding problem, I have included a screen shot of the database character-set and collation for the xxx_kunena_topics in case that is a problem (albeit the Report Configuration Settings tool reports "Database collation check:The collation of your table fields are correct"

sozzled commented 2 months ago

I don't see that issue on a test site using J! 4.4.4 and K 6.2.6.

aidanwhiteley commented 2 months ago

Yeah - I did re-test on 4.4.4 and 6.2.6. Given that its such a likely scenario to occur (i.e. the use of ' in messages) and no previous reported issues I can see, I'd assumed it must be something local to my setup but I'm struggling to see what it could be...

sozzled commented 2 months ago

You mentioned something about "unicode character" being involved.

What character encoding and database collation methods are you using with your database? I use utf8mb4 character encoding with utf8mb4_unicode_ci for my database collation.

aidanwhiteley commented 2 months ago

The schema default character set is utf8mb4 and the default collation is utf8mb4_0900_ai_ci.

As per the above screen grab, the character set for the first_post_message column in the xxx_kunena_topics table is utf8mb4 and the collation is utf8mb4_unicode_ci running the MySQL InnoDB engine.

Given that this doesn't seem to be happening to lots of other people, would you be able to point me to the file with the code that populates the description meta tag and I'll try to patch my local system so it can't output anything other than basic ASCII.

Ruud68 commented 2 months ago

Hi @aidanwhiteley , could you try the following and see if that fixes the issue: in file ./components/com_kunena/src/Controller/Topic/Item/TopicItemDisplay.php

replace line (594):

        $this->setMetaData('og:description', $multispaces_replaced, 'property');

with:

        $this->setMetaData('og:description', htmlspecialchars_decode($multispaces_replaced, ENT_COMPAT), 'property');

same for line (601)

        $this->setMetaData('twitter:description', $multispaces_replaced);

with:

        $this->setMetaData('twitter:description', htmlspecialchars_decode($multispaces_replaced, ENT_COMPAT));
aidanwhiteley commented 2 months ago

Hi,

many thanks for the suggestion. I did try it but the problem occurs when rendering the description meta tag rather than the Twitter or Facebook variants of description meta tags.

I did also try your suggestion on the setDescription call at line 674 of ./components/com_kunena/src/Controller/Topic/Item/TopicItemDisplay.php. In fact, I did try several variations as well as per the code snippet below

    echo "Input string for setDescription is: " . $small;                                                        
    //$str = htmlspecialchars_decode($small, ENT_COMPAT);                                                        
    //$str = htmlspecialchars($small, ENT_COMPAT);                                                               
    //$str = htmlentities($small, ENT_COMPAT, "UTF-8");                                                          
    //$str = preg_replace('/[[:^ascii:]]/', '', $small);                                                         
    //$str = preg_replace('/[[:cntrl:]]/', '', $small);                                                          
    $str = str_replace("'", "", $small);                                                                         
    $this->setDescription($str);  

The only one that worked is the uncommented string replace of the single quote. I traced the call from $this->setDescription($str) to ./components/com_kunena/src/View/Topic/HtmlView.php:510: public function setDescription($description) to the Joomla Document class ./libraries/src/Document/Document.php

    public function setMetaData($name, $content, $attribute = 'name')            
    {                                                                            
        <<snip>>

        if ($name === 'generator') {
            $this->setGenerator($content);
        } elseif ($name === 'description') {
            echo 'In joomlas setMetaData ' . $content;    
            $this->setDescription($content);
        } else {
            $this->_metaTags[$attribute][$name] = $content;
        }

        return $this;
    }

In all that code, the passed parameter is "uncorrupted" i.e. it still contains single quote correctly.

As per earlier screen grab of the rendered HTML, the single quote is being replaced by SOH and STX control characters around a back slash escaped single quote. This must be happening during the rendering of the Joomla document to HTML.

I'm afraid tracing this further is beyond me (I last wrote any PHP about 15 years ago). So, in the meantime, I'll "patch" TopicItemDisplay.php to have the $str = str_replace("'", "", $small); hack as above.

Many thanks

Ruud68 commented 2 months ago

Ok, sorry my bad. Can't reproduce so that makes this difficult to fix. Just make sure that you inserted a 'real' single quote and not something copied from mobile / text document that looks like a quote but is some kind of character encoded thing.

aidanwhiteley commented 2 months ago

Hi,

a final update / close of this issue.

Yes - I can confirm it was standard single quote character and not some forum of MS "curly quote".

The problem also occurred when the forum category description contained a single quote as this content seems to end up in the og:description meta tag. Which again calls the Joomla setMetaData and then fails with the same problem when the HTML for the document is rendered.

However, I've found that the problem goes away (i.e. doesn't happen) when I uplift my site from Joomla 4.4.x to 5.x so I don't think it is worth keeping this issue open.

Thanks for all the help and suggestions though!