Fix multiline Label widget to incorrectly display message with hyperlink

leonard-IMBERT commented 1 year ago

This is a fix for #107.

All the test are passing but mypy is angry about file I did not modify, I don't know if this is an issue.

Also pylint wanted to change some yaml (maybe the actual configuration is not enough ?) and black is complaining about files in the docs folder (that do not seems to be python)

I've just one small interrogation, In case of multiline string, it's complicated to enact the hyperlink specification of TIM in the case you're omitting the last [/~], should \n be considered as such ? Or do you allow multiline hyperlink ? In this MR, the behaviour is:

If there is no ending [/~] one is appended to the string.

bczsalba commented 1 year ago

Thank you for this!

I'm currently not quite able to work (still abroad for the holidays), but generally this looks good. Hyperlinks are my least favorite thing to deal with within the terminal (mostly due to how different they are from other styles), and I'm afraid it was a bit neglected in the move from TIM-v2 to TIM-v3.

In regards to the linting errors, I'm pretty sure it's not your fault, so don't mind it too much, lol. They are a whiny bunch, and pleasing all of them is a whole art in and of itself!

In the case of tokens_to_markup, I prefer keeping the output markup as close as possible to the original stream of tokens, so I'd say that part of the PR can be omitted. Other than that, I'll look further into it in a couple of days!

leonard-IMBERT commented 1 year ago

Hi, thanks for the response :)

I understand for tokens_to_markup, is the exact behaviour described somewhere in the documentation ? I'll try to stick to it. Also after looking back at the PR, I think there should be a more elegant way of doing this. I'll leave comments in the File changes section

bczsalba commented 1 year ago

is the exact behaviour described somewhere in the documentation ?

The best description of it is here, but admittedly it's not very thorough. The general idea for the function is to take a stream of tokens and spit out markup that matches the tokens 1 to 1.

I still haven't had the time to come back to this project, since I have a full game project to hand in on the 25th, which I'm doing 95% of the programming for. I expect to be done earlier than that, but by the 26th I should have time for sure. Sorry about the delays!

leonard-IMBERT commented 1 year ago

I still haven't had the time to come back to this project, since I have a full game project to hand in on the 25th, which I'm doing 95% of the programming for. I expect to be done earlier than that, but by the 26th I should have time for sure. Sorry about the delays!

Don't worry, no problems :) I finally found some time to go back and implement the hyperlink clearer token. The PR is now much cleaner (less modification). My head hurted a bit when the test where randomly passing due to randomness in some of them (You don't have so much token, I'm sure the combinatory should not take to much time to test).

Aaaand It seems that I broke something... Don't merge now please

leonard-IMBERT commented 1 year ago

The tests are passing but In my project some artefact appears:

bczsalba commented 1 year ago

Hey again, what's the current status here? I don't completely understand the artifacts or what causes them.

leonard-IMBERT commented 1 year ago

Hi ! Sorry, I haven't been able to work back on this issue yet. I expect, I'll be able to work on it by the end of this week :/

I'm not sure either exactly where they could come from, but I expect the it has to come from the reverse clearer add for the hyperlink. I need to do some check but the clearer itself containing an ESC (\x1b) could cause the issue

bczsalba commented 1 year ago

That's alright, it's taken me like a month to properly look at the PR, lol. Let me know if you need anything!

leonard-IMBERT commented 1 year ago

Finally a fix I'm satisfied with ^^' !

I wanted to add the hyperlink clearer to the list of clearer (it would have been in my opinion the best fix) but because you don't differentiate between CSI and OSC there is a conflict with the invisible token (/~ is \x1b]8 and invisible is \x1b[8. This kind things would need each "action token" (change style, etc...) to hold their kind (if they are OSC or CSI). Would probably also streamline the ansi parsing probably...

Overall I couldn't do another way than making hyperlink a special case (without implementing the change I described up there).

bczsalba commented 1 year ago

I'll look it over tomorrow, thank you!

About OSC vs CSI differentiation; I'm pretty sure the only OSC codes we ever read are hyperlink ones, so I didn't feel like the internal refactor was worth it at the time (We also write a couple of OSCs, but those are usually read by simple regex). It is a thing I've never been satisfied with, and I'm pretty sure it's about the only thing I am currently not satisfied with in the markup section.

Then again, it might be a tiny change that makes everything a lot better, lol.

leonard-IMBERT commented 1 year ago

Hi @bczsalba , sorry to bother but did you find time to look at this PR ?

bczsalba commented 1 year ago

Sorry, life got ahead of me!

It looks to be alright and I like the vibe of the changes a lot. I did since realize that I cannot reproduce the original issue, but if in your case it's fixed I'll take that as a win :)

There seems to be an unrelated bug in break_line that messes up links when they are broken up, but that's been there before your changes.

Thanks for the great communication!

bczsalba commented 1 year ago

@leonard-IMBERT I think this PR broke something that leads to PlainTokens containing hyperlink-closing sequences to be generated, which breaks the ability to export SVG files. Could you look into it?

I'll mark the bits I think might be causing this in the code.

leonard-IMBERT commented 1 year ago

Do you have a reproducible example that I can toy with? It would help :)

bczsalba commented 1 year ago

I've been trying to, but it seems elusive! The best I have so far is running mkdocs build on the commit 91c173cb2049b772a07c13918277b4069766dbf7, which should complain something about an incorrect SVG. If you drill down to it you'll find that the "incorrect" part is the existence of escaped symbols within the SVG, which happens within the Located in ... part of the inspect(Container) call.

I'll try to find something more in a second.

bczsalba commented 1 year ago

Alright, I got it down to:

from pytermgui.markup import tim

for plain in tim.group_styles(
    "\x1b]8;;path.py\x1b\\inner\x1b]8;;\x1b\\\x1b]8;;\x1b\\outer"
):
    assert "\x1b]8;;\x1b\\" not in plain.plain, repr(plain)

Running this (on the aforementioned commit) shouldn't raise an assertion error, since hyperlink closers (but generally escape sequences) shouldn't show up in a "plain" tag. Not completely sure whether this is a thing this patch introduced, but the issue showed up afterwards and I don't remember any major changes in any of the systems other than this.

Let me know if you have any questions!

leonard-IMBERT commented 1 year ago

Sorry, took a bit of time to found some time x)

About the example

In the example you give me, you close two times the hyperlink. Smt like that [~path.py]inner[/~][/~]outer.

I don't know how the parser should behave... Throw an error ? Ignore the second closing ?

The problems comes from ptyermgui/regex.py

RE_LINK = re.compile(r"(?:\x1b\]8;;([^\\]*)\x1b\\([^\\]*?)\x1b\]8;;\x1b\\)")
RE_ANSI_NEW = re.compile(rf"(\x1b\[(.*?)[mH])|{RE_LINK.pattern}|(\x1b_G(.*?)\x1b\\)")

So the RE_LINK checks for "valid" links (opening and closing of the link). In that case, the first link match but not the second closing is left alone : In this implementation \x1b\]8;;\x1b\\ is not a valid token by itself -> it is not uderstood as style token either because closing link is a OSC not a CSI. The parser does not know what to do with it and so just yield it as a plain token (as I understand).

From my point of view it is a "reasonable" behavior in the sense that it clearly indicate an error in the format and that you should not silently erase tokens (but I admit that to an un-experimented user the cause is hard to pinpoint)

About SVG export randomly breaking

I didn't looked into it yet but, considering your example, I think the problem is that hyperlink closing token are yield twice in certain cases. If you have concrete example again, it will help.

leonard-IMBERT commented 1 year ago

After further thinking, I decided to create pull request/discussion to move from regex to state machine for parsing ANSI #130 . It will help correct this bug. Lets continue the discussion over there

bczsalba / pytermgui

Fix multiline Label widget to incorrectly display message with hyperlink #108

About the example

About SVG export randomly breaking