Nheko-Reborn / nheko

Desktop client for Matrix using Qt and C++20.
https://nheko-reborn.github.io/
GNU General Public License v3.0
1.92k stars 201 forks source link

URL parsing failing to identify `&gt` as closing `>` #1482

Open charles2910 opened 1 year ago

charles2910 commented 1 year ago

Describe the bug

Hi,

First of all, I started to use nheko a couple weeks ago and it's awesome, thanks for developing it!

I noticed last week a problem opening url in a room I joined. We're using a bot to sync IRC, matrix and telegram. Some messages have a url linking to the original one in matrix, but nheko is misinterpreting the &gt as part of the URL. I'll post 2 images showing the problem in nheko and the same message showed in element.

To Reproduce

I'm not exactly sure

What happened?

No response

Expected behavior

&gt rendered as > and not part of the URL

Screenshots

Message in nheko:

image

Message in element:

image

Version

0.11.3

Operating system

Linux

Installation method

Some repository (AUR, homebrew, distribution repository, PPA, etc)

Qt version

No response

C++ compiler

No response

Desktop Environment

No response

Did you use profiles?

Relevant log output

No response

Backtrace

No response

LeoniePhiline commented 1 year ago

The same occurs when URLs are enclosed in quotation marks. E.g. "http://foo.bar/baz" renders as "http://foo.bar/baz", where all but the first and last character are auto-linked.

(Using < and > to indicate the start and end of the auto-link, Nheko renders the above quotes-enclosed URL as "<http://foo.bar/baz&quot>;. See also the screenshot below.)

Nheko appears to encode " encoded as &quot; before auto-linking. The auto-linker assumes the ; of to be a token which designates the end of the auto-linked URL. The HTML entity &quot is split into to parts, with &quot being made part of the link (falsely assumed to be part of the URL), and ; following after the link.

One way to fix this would be to fix the auto-linker (regex?). However, I believe the major issue appears to be that the auto-linker operates on HTML-entity-encoded content, rather than on the decoded text. (The auto-linker sees &quot; but does not understand its meaning, where it should actually see " in the first place.)

Screenshot:

e2f3a028c38d34f9e7ffc37ee8ef335edbe88214