Open 4dwaith opened 1 year ago
EDIT: They added more styles since I wrote this comment
Hi @4dwaith,
Interesting idea, let's start to see what styles that are supported by whatsapp:
_italic_
*bold*
~strikethrough~
```monospace```
In html that would be:
<i>italic</i>
<b>bold<b>
<s>strikethrough</s>
<pre>monospace</pre>
<!-- or -->
<code>monospace</code>
We would need to either create some new regex patterns to detect the special characters or use a lightweight library to do it for us.
Several tests would be needed to catch the edge cases, for example if you have:
```var my_nice_variable = 'my string';```
It should not format the _nice_
as italic because it's already inside a code block.
Or an url with underscores may get formatted and not work anymore.
There are many things that can go wrong.
With this in mind I think that honestly this could overcomplicate things a bit too much for my liking, I'd like to keep this library dependency-free and as simple as possible.
can't think of a way to get this working without breaking the API contract.
That would not a problem as long as the feature is implemented behind an optional configuration. Something like this:
whatsapp.parseString(text, { parseRichText: true });
@Pustur These sequences look a lot like markdown. Maybe you can use an existing markdown formatter library (or perhaps the consuming code should use a markdown rendering library so you don't have to do anything at all.)
@speshak I'm more leaning towards the second option, this should be done externally to the library.
Also while the format looks like markdown, it's not exactly a common flavour of it as far as I can tell, in the following example, both the italic and bold are rendered as italic by default:
It seems possible to customize how that library works but I'm not currently interested in doing so.
@Pustur Apologies, I have no idea why I didn't notice your first response. I should've responded months ago.
Not sure about the regex pattern. As specified in your next example, whether or not to parse the italics depends on whether we have previously encountered a code marker. It won't be a context-free state machine, so I don't think we can use regular languages.
That said, I don't think your two examples would have an issue - underscores only indicate italics if there are spaces before the start mark and after the end mark, and no spaces after the start mark and before the end mark. URLs for sure wouldn't follow that rule, though code might.
I've played around a bit, and the rules actually seem straightforward and intuitive. Here are my conclusions
Code markers interrupt and unstyle everything else.
``` these *are* _just_ \~five\~ words ```
becomes.
these *are* _just_ ~five~ words
*these ```are just five``` words*
becomes
these are just five
words
The other three styles are compatible within each other.
*these _are \~just\~ five_ words* becomes these are ~just~ five words
When two styles conflict, the one that appeared first wins
*these _are just * five words_ becomes these _are just five words_
Thank you very much for that bit about strikethrough! All this time I thought we would be forced to use CSS attributes and span tags. Can't believe I hadn't heard of that tag, this looks much more doable now.
The marked library seems relatively easy to extend, I got bold to work, but the new problem is that newlines are not normally respected since markdown needs 2 spaces at the end to insert a <br>
See the Codesandbox demo
Maybe you can make it work properly in the context of Whatsapp messages
I think the task of the parser is to identify the individual elements of the backup and to convert them into a structure for the application, and not to identify style or meta information such as a link within a text. Attachments, system messages, captions are not easy to recognize due to the poorly documented data format (and this with different export formats depending on the operating system and language).
Whatsapp messages have styling within text, such as bold, underline, strikethrough. The given format only allows for plain text messages.
I'm new to open source, but would be happy to help build this feature, but can't think of a way to get this working without breaking the API contract.