Builditluc / wiki-tui

A simple and easy to use Wikipedia Text User Interface
https://wiki-tui.net/
MIT License
395 stars 14 forks source link

[BUG] Some templates inside articles are cleared out #210

Closed non-descriptive closed 11 months ago

non-descriptive commented 11 months ago

General Information Version: 0.8.1 Installation Method: cargo

Describe the bug Looks like some articles has issues with oversanitization of templates. At least translation template doesn't displayed properly.

To Reproduce Default language: English

  1. Search Tolyatti
  2. Look in the search result side info as article summary rendered. Windows Terminal: image Command Line: image Don't mind utf content.
  3. Open article itself - the same translations aren't rendered properly WT: image CMD: image

Translation is absent though it was rendered in search results, Population count also is absent, only semicolons are left.

Expected behavior Translations at least should have the same view as in search bar. Other templates should have at least some messy text content if avaliable. Ideally - has reference links as well.

Builditluc commented 11 months ago

Hey @non-descriptive and thank you for your bug report!

The problem your seeing here is because we the text of the search preview and the article is a different format. When searching for a query, the preview (or so called "snippet") we get is plaintext and looks like this

{
  "ns": 0,
  "title": "Tolyatti",
  "pageid": 528872,
  "size": 31441,
  "wordcount": 3167,
  "snippet": "<span class=\"searchmatch\">Tolyatti</span> (Russian: \u0422\u043e\u043b\u044c\u044f\u0301\u0442\u0442\u0438, IPA:\u00a0[t\u0250l\u02b2\u02c8j\u00e6t\u02b2(\u02d0)\u026a]), also known as Togliatti, formerly known as Stavropol (1737\u20131964), is a city in Samara Oblast, Russia",
  "timestamp": "2023-08-04T10:20:27Z"
},

https://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=tolyatti

As you can see, the translation is in plaintext (well, encoded in unicode but we can handle that)

When we now look at the corresponding portion of the article, we see that the Russian translation is formatted like this (in HTML)

image

As you can see the translation is contained inside a <span> HTML element which we don't support yet (and because of #113 they are simply excluded from the article).This kind of issue (elements or custom formatting we don't specifically support yet) is currently the most frustrating one we have in wiki-tui. I can and will fix those as I stumble upon them (or when people like you report those bugs), but for the future we would need a more robust solution. I'm already looking into implementing the Mediawiki-specific DOM used by parsoid

non-descriptive commented 11 months ago

hmm, I thought preview is made of article. Will know, thanks. I thought you did simple sanitation like js do with InnerText and treated links and other formatting as special case.

Builditluc commented 11 months ago

I've uploaded a patch to fix-span-parsing. Would you mind trying it out to see if it fixes the issue?

non-descriptive commented 11 months ago

If fixes content, but introduce some scrolling bug - it starts to replicate some characters on blank lines. image Just scroll the same article about 10 lines up and down to repro. Doesn't appears on main.

Also noticed there's bitwise or on boolean in this line.

Builditluc commented 11 months ago

The scrolling bug is weird because it doesn't occur on my machine (also tested it with different terminals). Could you maybe try it with a different terminal (maybe it has something to do with the unicode?)?

non-descriptive commented 11 months ago

Tried on windows cmd, alacritty, Windows Terminal ,WSL and git-bash. git-bash generally doesn't work by itself, all others has this issue. Don't have any *nix machine close by to check whether is't some windows specific bug or issues with unicode. Maybe worth checking similar issues on tui backend.

Builditluc commented 11 months ago

That’s weird…. In a few hours (or tomorrow morning) I can try it on my windows machine

On 8. Aug 2023, at 6:06 PM, Dmitry Kozlovtsev @.**> wrote: Tried on windows cmd, alacritty, Windows Terminal ,WSL and git-bash. git-bash generally doesn't work by itself, all others has this issue. Don't have any nix machine close by to check whether is't some windows specific bug or issues with unicode.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were assigned.Message ID: @.***>

Builditluc commented 11 months ago

Just tried it on windows and still, I cannot reproduce the bug

non-descriptive commented 11 months ago

Not sure what's up with my setup. It's consistently reproduced. Probably something related with wide characters or default encoding for page but not sure where to look. Tried to bump some dependencies but still has issues. Probably related to cursive bugs but didn't really found anything related enough. Will return with details if will find any.

Builditluc commented 11 months ago

related to cursive bugs

We already have some artifact bug related to cursive (#74), so this really could be it. Those kind of errors and the missing customization makes me think if we need to switch the UI library (maybe to ratatui). But for now, thank you for the info!

non-descriptive commented 11 months ago

UI library

maybe good idea since cursive is async and ratatui (actually the only living alternate option) is immidiate mode. But it's gonna be quite a rework.

Also reproduced the same rendering issues on main just switching language to Russian. First word of article include stressed symbol and rendered Толья́тт (<< this literally copied from terminal). So methinks it's separate bug not related to templates and this one can be safely merged. I'll open separate bug and probably reference it on cursive side.

Builditluc commented 11 months ago

Same issue on main

Oh wow that's comforting... could you maybe check if the issue occurs in v0.8.1? If not, it could be an issue with any of the bug fixes since

non-descriptive commented 11 months ago

Looks like 0.7.0 already had it. Reproduced on all versions since.

Builditluc commented 11 months ago

Well, that's an issue...

I skimmed through the changes done in this release (and the ones before it) and the only thing imaginable would be the bump to cursive v0.20 in v0.6.1.

I think I'm going to start working on the transition to ratatui.