[Bug] Inline code autoformats code between underscores

arturolinares commented 2 years ago

Describe the bug

When you type or paste a word containing several underscores after an inline code backtick, the text is transformed into italics, removing the underscores. Because there is no other way to apply the inline code format, it cannot be applied inside a document.

To Reproduce Steps to reproduce the behavior:

Type a word that contains several underscores, for example: `this_is_a_test`
After closing the last back tick, continue with a space
The first pair of underscores are removed, and the text formatted.

Expected behavior The inline text intact, without italics and the inline code style applied.

Screenshots Peek 2021-11-08 17-59

System Information (please complete the following information):

Device: laptop
OS: Linux
Browser: Vivaldi (Chrome)
Browser Version: any

Additional context NA

johnozbay commented 2 years ago

Hi there! 👋🏻

Thanks a million for filing this, and for the screen-grab! This was incredibly helpful 🙏🏻

I've fixed this in the alpha version now and linked this issue to our internal issue/release tracker. Once the update with this fix is out (should be in a few days), this issue will be closed automatically, and you'll get a notification!

Thank you so much for your help and patience with this! Deeply appreciate this detailed bug catch!

arturolinares commented 2 years ago

Thanks a lot @johnozbay. It definitely is better, but I'm still seeing the issue :)

https://user-images.githubusercontent.com/203078/142250510-81a06c77-e0d8-489c-839a-d57ab2faf114.mp4

Congratulations on an awesome application.

arturolinares commented 2 years ago

Hi @johnozbay! I would like to reopen this issue, since it is still happening. Thanks!

johnozbay commented 2 years ago

@arturolinares Thanks for this, and for your kind words 🙏🏻 Sorry we're failing to catch these. Rich text / markdown processing is incredibly difficult.

We're looking into what's causing this specific bug and will hopefully have a fix for this soon. I think we may need to re-write some parts of our markdown interpreter, which feels like opening pandoras box. Will keep this thread posted.

Just to make sure I'm understanding this correctly – the initial problem is now solved, but the second screen-capture is not, correct?

arturolinares commented 2 years ago

That's correct.

I think that regular expressions are not powerful enough for this and a real parser is needed to keep the state of the formatted text. For example, typing "_Hello*" will make the word italics, even though the character markers for italics don't match (here is a regex visualization for italics as used in the project: https://regexr.com/6ak0e).

The same happens in the videos attached to this ticket: The text inside code fences should not be interpreted as markdown, but the regular expressions to exclude them can grow very complicated really fast.

BTW, I found this project which managed to solve all the issues mentioned here (and has MIT license :D):

https://cloverhearts.github.io/quilljs-markdown/

johnozbay commented 2 years ago

Unfortunately, we're going to have to use Regex, as there's no other way to do this on mobile devices.

In short,

Unlike hardware keyboards we use on our laptops, software keyboards on mobile devices, don't send keystrokes / keydown / keyup events. A couple examples : – GBoard / Swipe-type etc type stuff inserts entire words into contenteditable files, – they don't send newline characters until you insert two new lines, – send it differently depending on language / region of the keyboard etc etc.

You can read a bit more about the nightmare this causes here :
https://github.com/cryptee/web-client/issues/129#issuecomment-877750792

So basically, if we can't watch for specific keydown / keyup events etc, the only other way we can do any rich-text-editing using markdown on the internet is by using regex.

And regex has its own unique set of nightmares as we're experiencing in this bug. But also some interoperability challenges.

For example, Safari doesn't support the lookbehind regex token ((?<=\s)) which is what the quilljs-markdown package you mentioned uses. So if we were to use that package, Cryptee would stop working on all iOS devices + all desktop Webkit browsers like Safari, since Safari would fail to parse the javascript and crash on startup. See here for others pointing this out about the library as well : https://github.com/cloverhearts/quilljs-markdown/issues

try loading the link you sent in Safari on a Mac or iOS and you'll see it's broken.

And with rich-text-editors markdown compatibility is actually not as simple as parsing / regex etc. That's just the first half of the problem.

The second half of the problem is replacing / removing the markdown triggers. then styling the text in a way that doesn't break editor's undo/redo history. (when you press ctrl + z or cmd + z etc)

So if you type ### Header we have to detect it's a markdown command, then remove ### then remove the space character (which may be different for different operating system / region / language / keyboard combinations), then style Header as H3 in a way that doesn't break undo. So if you press ctrl + z things need to go back to the way they were correctly. Otherwise the order of delta update to our editor could break irreversibly, and damage documents for good.

Similarly, the library you sent is failing to do this and someone's already filed an issue about this as well.

I'd say that all things considered, our markdown parser works decently – not perfectly – but decently. But our markdown replacer & styler part of our editor has bugs as we see here, and it's replacing stuff even when it shouldn't.

Which is why it's taking us a while to debug / fix these types of bugs. There's no easy way to test these things out, as there's millions and millions of combinations, permutations and edge cases, so when we fix something, we risk breaking something else.

johnozbay commented 2 years ago

Hi there!

I've made some changes to the way we parse markdown, and how we watch for exceptions in formats. And according to my newly built tests, this should address all issues in this thread without breaking anything else.

So if you type *test_ it shouldn't falsely become italic anymore. only _test_ or test`. Same goes for bold and bolditalic as well
If something is in a code tag like test_test_test this shouldn't convert to italic / bold / bolditalic etc ever either
We detected the same was happening with headers, strikeouts, blockquotes etc inside the code tag as well, and now made sure that can't happen also.

Here's a screenshot full of a bunch of edge-case tests to show what I mean

Screenshot 2021-12-04 at 16 03 24

Please do give things a try, but I think this time ... I managed to fix it all! 😅

Thanks a million for reporting and helping fix these pesky bugs! I owe you one Arturo!

arturolinares commented 2 years ago

Thank you! It works very well now.

cryptee / web-client

[Bug] Inline code autoformats code between underscores #137