Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.36k stars 2.78k forks source link

[$500] Do not allow emojis to be formatted (ie italics) #14676

Open kavimuru opened 1 year ago

kavimuru commented 1 year ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

  1. Go to android app
  2. Go to any chat
  3. Try to make emoji in italic form:-Type in : insertemoji For example:⁦:100:, _⁦:purpleheart: , _⁦:partyingface: ,
  4. Notice that the top right part of the emoji cuts off. Few Example: the :100:, :purple_heart:, :partying_face: emoji has its top right section cut , (It happens on android only latest)

Expected Result:

Users should not be able to format emojis, lets keep them with normal style (no bold, italics, strikethrough etc)

Actual Result:

Users can make emoji italic which leads to the emojis to cut off a few parts of the emoji (for some emojis)

Workaround:

unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

Version Number: 1.2.62-1 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos:

Expensify/Expensify Issue URL: Issue reported by: @priya-zha Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1675094271396099

View all open jobs on GitHub

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~017d08e0c09284d31f
  • Upwork Job ID: 1620563000799895552
  • Last Price Increase: 2024-08-08
  • Automatic offers:
    • eh2077 | Reviewer | 0
    • wildan-m | Contributor | 0
Issue OwnerCurrent Issue Owner: @mallenexpensify / @mallenexpensify
anmurali commented 1 year ago

Confirmed that this happens on Android native app.

Confirmed it does not happen on Mac/Chrome. But related, when I try the same reproduction steps on iOS native, iOS Chrome and Mac Safari - I cannot italicize emojis at all. Is that expected?

melvin-bot[bot] commented 1 year ago

Job added to Upwork: https://www.upwork.com/jobs/~017d08e0c09284d31f

melvin-bot[bot] commented 1 year ago

Current assignee @anmurali is eligible for the External assigner, not assigning anyone new.

melvin-bot[bot] commented 1 year ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @aimane-chnaif (External)

melvin-bot[bot] commented 1 year ago

Triggered auto assignment to @flodnv (External), see https://stackoverflow.com/c/expensify/questions/7972 for more details.

alexxxwork commented 1 year ago

Proposal

Quick fix:

diff --git a/src/styles/styles.js b/src/styles/styles.js
index b826ed336f..84286591e9 100644
--- a/src/styles/styles.js
+++ b/src/styles/styles.js
@@ -63,6 +63,7 @@ const webViewStyles = {
         em: {
             fontFamily: fontFamily.EXP_NEUE,
             fontStyle: 'italic',
+            width: variables.fontSizeNormalHeight,
         },

         del: {
flodnv commented 1 year ago

I don't think we should be able to italicize emojis, asked here: https://expensify.slack.com/archives/C01GTK53T8Q/p1675266083150699

JmillsExpensify commented 1 year ago

Agree with @flodnv.

tienifr commented 1 year ago

Proposal

The Emoji is italicized because it is contained in the <em></em> tag (same as when we make a text italic). We can observe the behavior on both Chrome web and Android.

In order to remove the ability to italicize emojis, I'll wrap any emoji that are inside the <em></em> with a expensify-emoji custom HTML tag, and give it a fontStyle: 'normal' style. This is the same way that Github does it.

(The emoji regex below is taken from the CONST.REGEX.EMOJIS in Expensify app)

In https://github.com/Expensify/expensify-common

diff --git a/lib/ExpensiMark.js b/lib/ExpensiMark.js
index 7f94ffe..057e87c 100644
--- a/lib/ExpensiMark.js
+++ b/lib/ExpensiMark.js
@@ -134,6 +134,11 @@ export default class ExpensiMark {
                 regex: /(?!_blank")\b_((?!\s).*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                 replacement: '<em>$1</em>',
             },
+            {
+                name: 'emoji',
+                regex: /(<em>.*?)([\p{Extended_Pictographic}\u200d\u{1f1e6}-\u{1f1ff}\u{1f3fb}-\u{1f3ff}\u{e0020}-\u{e007f}\u20E3\uFE0F]|[#*0-9]\uFE0F?\u20E3)(.*?<\/em>)/gu,
+                replacement: '$1<expensify-emoji>$2</expensify-emoji>$3',
+            },
             {
                 // Use \B in this case because \b doesn't match * or ~.
                 // \B will match everything that \b doesn't, so it works

In https://github.com/Expensify/App

diff --git a/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js b/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
index 456cd1ecf3..0bf75ea9d2 100755
--- a/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
+++ b/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
@@ -39,6 +39,10 @@ const customHTMLElementModels = {
         tagName: 'email-comment',
         mixedUAStyles: {whiteSpace: 'normal'},
     }),
+    'expensify-emoji': defaultHTMLElementModels.span.extend({
+        tagName: 'expensify-emoji',
+        mixedUAStyles: {fontStyle: 'normal'},
+    }),
 };

 const defaultViewProps = {style: [styles.alignItemsStart, styles.userSelectText]};

There're small possible variations based on our preference:

There're some further minor fixes required:

I'll add those if we want to go with this direction.

Pls note we have to fix the same in the BE as well, in order to test the above fix we need to enable offline mode.

tienifr commented 1 year ago

working well after the fix:

https://user-images.githubusercontent.com/113963320/216310065-25e96b0c-aa25-4f56-bd73-79cd9d89c878.mov

s77rt commented 1 year ago

@tienifr Your approach looks solid however the regex is not correct as it will only match one emoji, try sending hi 😁 there 😁, it will wrap the first emoji but not the second one. Also it will only work with em tags and I think it would be better to wrap the emojis all the time (this will also cover the bold case).

Here are my suggestions:

  1. Replace the regex part with:
    
            {
                name: 'emoji',
                regex: /[\p{Extended_Pictographic}\u200d\u{1f1e6}-\u{1f1ff}\u{1f3fb}-\u{1f3ff}\u{e0020}-\u{e007f}\u20E3\uFE0F]|[#*0-9]\uFE0F?\u20E3/gu,
                replacement: match => `<expensify-emoji>${match}</expensify-emoji>`,
            },

2. Move the regex `emoji` to be before `italic` and after `autolink` (Not necessary, just a personal opinion as I like to preserve the order of italic, bold, strikethrough regex rules)
3. In `mixedUAStyles: {fontStyle: 'normal'}` add `fontWeight: 'normal'`
tienifr commented 1 year ago

Thanks @s77rt for your suggestion!

Updated proposal

Adding to the original proposal, Here's the my updated diff for the regex so that it matches all rather than just one.

diff --git a/lib/ExpensiMark.js b/lib/ExpensiMark.js
index c535048..62b183f 100644
--- a/lib/ExpensiMark.js
+++ b/lib/ExpensiMark.js
@@ -242,6 +242,11 @@ export default class ExpensiMark {
                 regex: /<(em|i)(?:"[^"]*"|'[^']*'|[^'">])*>(.*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
                 replacement: '_$2_',
             },
+            {
+                name: 'emoji',
+                regex: /([\p{Extended_Pictographic}\u200d\u{1f1e6}-\u{1f1ff}\u{1f3fb}-\u{1f3ff}\u{e0020}-\u{e007f}\u20E3\uFE0F]|[#*0-9]\uFE0F?\u20E3)/gu,
+                replacement: '<expensify-emoji>$1</expensify-emoji>',
+            },
             {
                 name: 'bold',
                 regex: /<(b|strong)(?:"[^"]*"|'[^']*'|[^'">])*>([\s\S]*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
tienifr commented 1 year ago

@s77rt Regarding the fontWeight: 'normal', I don't feel strongly about since it doesn't seem that the bold fontWeight affects how the emoji looks.

s77rt commented 1 year ago

@tienifr

tienifr commented 1 year ago

@s77rt I tried to zoom the emojis in but can't recognize any difference between an emoji that has fontWeight: 'bold' and a normal one. But we surely can add that if we want to be extra safe in that bold case (Github also does it).

Thanks for the input!

diff --git a/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js b/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
index 456cd1ecf3..0bf75ea9d2 100755
--- a/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
+++ b/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
@@ -39,6 +39,10 @@ const customHTMLElementModels = {
         tagName: 'email-comment',
         mixedUAStyles: {whiteSpace: 'normal'},
     }),
+    'expensify-emoji': defaultHTMLElementModels.span.extend({
+        tagName: 'expensify-emoji',
+        mixedUAStyles: {fontStyle: 'normal', fontWeight: 'normal'},
+    }),
 };

 const defaultViewProps = {style: [styles.alignItemsStart, styles.userSelectText]};
s77rt commented 1 year ago

@tienifr While seaching I found this image.

yLrBRQd

The middle emoji is in bold but it's hard to notice also it may depend on the font. Adding fontWeight: 'normal' is more a safety step.

MelvinBot commented 1 year ago

@flodnv, @anmurali, @aimane-chnaif Eep! 4 days overdue now. Issues have feelings too...

MelvinBot commented 1 year ago

@flodnv, @anmurali, @aimane-chnaif Huh... This is 4 days overdue. Who can take care of this?

aimane-chnaif commented 1 year ago

reviewing proposals shortly

bernhardoj commented 1 year ago

Proposal

I was trying to find a solution for this and stumbled upon this issue. I would like to propose an alternative solution that would be useful later.

I agree with the above proposal to enclose all emoji with a custom tag to apply a custom style later on.

But the first thing to do is to change the emoji regex.

I previously worked on this issue by providing a new regex and it works well for that case. However, when we use that regex to match composite emojis, for example 👨‍👨‍👦, it will produce this array of string: ['👨', '‍', '👨', '‍', '👦']

The empty string is zero-width joiner. Again, this works well for the case I mentioned from the previous paragraph (we only care what the string contains). But for this case, we need to replace the emoji and enclose it with a tag. If we use the current emoji regex, it will enclose each part of the emoji with the tag, instead of the composited emoji.

Expected: <emoji>👨‍👨‍👦</emoji>

Actual: <emoji>👨</emoji><emoji></emoji><emoji>👨</emoji> and so on...

So, we need a better regex to capture the composite emoji, below is the new regex.

diff --git a/src/CONST.js b/src/CONST.js
index ae3ffca7f..8399de047 100755
--- a/src/CONST.js
+++ b/src/CONST.js
@@ -790,7 +790,7 @@ const CONST = {
         WEBSITE: /^((https?|ftp):\/\/)(([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}(:\d+)?(\/[-a-z\d%_.~+]*)*(\?[;&a-z\d%_.~+=-]*)?(#[-a-z\d_]*)?$/i,

         // eslint-disable-next-line max-len, no-misleading-character-class
-        EMOJIS: /[\p{Extended_Pictographic}\u200d\u{1f1e6}-\u{1f1ff}\u{1f3fb}-\u{1f3ff}\u{e0020}-\u{e007f}\u20E3\uFE0F]|[#*0-9]\uFE0F?\u20E3/gu,
+        EMOJIS: /[\p{Extended_Pictographic}](\u200D[\p{Extended_Pictographic}]|[\u{1F3FB}-\u{1F3FF}]|[\u{E0020}-\u{E007F}]|\uFE0F|\u20E3)*|[\u{1F1E6}-\u{1F1FF}]{2}|[#*0-9]\uFE0F?\u20E3/gu,
         TAX_ID: /^\d{9}$/,
         NON_NUMERIC: /\D/g,
         EMOJI_NAME: /:[\w+-]+:/g,

I have tested the new regex against all emojis from assets/emojis file and make sure all EmojiTest.js test cases pass.

Here is how I test it:

diff --git a/tests/unit/EmojiTest.js b/tests/unit/EmojiTest.js
index 25626872c..b68bfbca0 100644
--- a/tests/unit/EmojiTest.js
+++ b/tests/unit/EmojiTest.js
@@ -1,5 +1,6 @@
 import _ from 'underscore';
 import Emoji from '../../assets/emojis';
+import CONST from '../../src/CONST';
 import * as EmojiUtils from '../../src/libs/EmojiUtils';

 describe('EmojiTest', () => {
@@ -11,12 +12,16 @@ describe('EmojiTest', () => {
             }

             // When we match every Emoji Code
-            const isEmojiMatched = EmojiUtils.containsOnlyEmojis(emoji.code);
+            const isEmojiMatched = EmojiUtils.containsOnlyEmojis(emoji.code) 
+                                && emoji.code.match(CONST.REGEX.EMOJIS).length === 1;

             let skinToneMatched = true;
             if (emoji.types) {
                 // and every skin tone variant of the Emoji code
-                skinToneMatched = _.every(emoji.types, emojiWithSkinTone => EmojiUtils.containsOnlyEmojis(emojiWithSkinTone));
+                skinToneMatched = _.every(emoji.types, emojiWithSkinTone => {
+                    return EmojiUtils.containsOnlyEmojis(emojiWithSkinTone) 
+                        && emoji.code.match(CONST.REGEX.EMOJIS).length === 1;
+                });
             }
             return skinToneMatched && isEmojiMatched;
         });

Here is the code part to enclose the emoji with tag.

explanation I prefer to do the string replacement here compared to doing it on `ExpensiMark`. As I said at the first sentence above, this will make easier when we do the replacement inside our App code for [this](https://github.com/Expensify/App/issues/4114) improvement. For the context, the mentioned improvement expects the emoji font size will have a different font size compared to the normal text, whether the text only contains emoji or also contains some text. I know this issue does not ask for that, but if we do the replacement at `ExpensiMark`, later on when we work on that improvement, we need somehow to tackle the case for the emoji only text and it is not feasible to do it at `ExpensiMark`. If we do the replacement in our App code, we can just easily change the tag to another tag that indicates the text only contains emoji.
diff --git a/src/pages/home/report/ReportActionItemFragment.js b/src/pages/home/report/ReportActionItemFragment.js
index c4f593517..4f34733cc 100644
--- a/src/pages/home/report/ReportActionItemFragment.js
+++ b/src/pages/home/report/ReportActionItemFragment.js
@@ -108,7 +108,8 @@ const ReportActionItemFragment = (props) => {
             // Only render HTML if we have html in the fragment
             if (!differByLineBreaksOnly) {
                 const editedTag = props.fragment.isEdited ? '<edited></edited>' : '';
-                const htmlContent = html + editedTag;
+                const htmlContent = EmojiUtils.encloseEmojisWithTag(html, 'emoji') + editedTag;
+                
                 return (
                     <RenderHTML
                         html={props.source === 'email'
diff --git a/src/libs/EmojiUtils.js b/src/libs/EmojiUtils.js
index d6ed8875e..dce4ad93c 100644
--- a/src/libs/EmojiUtils.js
+++ b/src/libs/EmojiUtils.js
@@ -245,6 +245,10 @@ function suggestEmojis(text, limit = 5) {
     return [];
 }

+function encloseEmojisWithTag(text, tag) {
+    return Str.replaceAll(text, CONST.REGEX.EMOJIS, g => `<${tag}>${g}</${tag}>`);
+}
+
 export {
     getDynamicHeaderIndices,
     mergeEmojisWithFrequentlyUsedEmojis,
@@ -253,4 +257,5 @@ export {
     replaceEmojis,
     suggestEmojis,
     trimEmojiUnicode,
+    encloseEmojisWithTag,
 };

Now, apply the style to the custom tag

diff --git a/src/styles/styles.js b/src/styles/styles.js
index 2b2a3498d..8e7d163af 100644
--- a/src/styles/styles.js
+++ b/src/styles/styles.js
@@ -1065,6 +1065,11 @@ const styles = {
         textDecorationLine: 'none',
     },

+    emojisText: {
+        fontStyle: 'normal', 
+        fontWeight: 'normal',
+    },
+
     onlyEmojisText: {
         fontSize: variables.fontSizeOnlyEmojis,
         lineHeight: variables.fontSizeOnlyEmojisHeight,
diff --git a/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js b/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
index 456cd1ecf..fcfe6b809 100755
--- a/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
+++ b/src/components/HTMLEngineProvider/BaseHTMLEngineProvider.js
@@ -39,6 +39,10 @@ const customHTMLElementModels = {
         tagName: 'email-comment',
         mixedUAStyles: {whiteSpace: 'normal'},
     }),
+    'emoji': defaultHTMLElementModels.span.extend({
+        tagName: 'emoji',
+        mixedUAStyles: styles.emojisText,
+    }),
 };

 const defaultViewProps = {style: [styles.alignItemsStart, styles.userSelectText]};

The last problem is when we type emojis inside code (`) tag, the text does not show, for example😄. The text will be converted to😄. We have a custom rendered for code tag and it's only taking a text inside the code tag, but the emoji is enclosed withemoji` tag. The solution is to get the text inside of it.

The getEmojiTextInsideCode will iterate over the children inside the code tag and get the data (text) of it. For example, if we have this text a😄, the children will be <TText data="a" /> and <TText data="😄" />.

diff --git a/src/components/HTMLEngineProvider/htmlEngineUtils.js b/src/components/HTMLEngineProvider/htmlEngineUtils.js
index 71a5e9267..fb588ff08 100644
--- a/src/components/HTMLEngineProvider/htmlEngineUtils.js
+++ b/src/components/HTMLEngineProvider/htmlEngineUtils.js
@@ -44,8 +44,13 @@ function isInsideComment(tnode) {
     return false;
 }

+function getEmojiTextInsideCode(tnode) {
+    return tnode.children.reduce((prev, curr) => prev + curr.data, '');
+}
+
 export {
     computeEmbeddedMaxWidth,
     isInsideComment,
     isCommentTag,
+    getEmojiTextInsideCode,
 };
diff --git a/src/components/InlineCodeBlock/index.js b/src/components/InlineCodeBlock/index.js
index af33b4468..222b7d877 100644
--- a/src/components/InlineCodeBlock/index.js
+++ b/src/components/InlineCodeBlock/index.js
@@ -1,6 +1,7 @@
 import React from 'react';
 import inlineCodeBlockPropTypes from './inlineCodeBlockPropTypes';
 import Text from '../Text';
+import * as HTMLEngineUtils from '../HTMLEngineProvider/htmlEngineUtils';

 const InlineCodeBlock = (props) => {
     const TDefaultRenderer = props.TDefaultRenderer;
@@ -12,7 +13,9 @@ const InlineCodeBlock = (props) => {
             <Text
                 style={{...props.boxModelStyle, ...props.textStyle}}
             >
-                {props.defaultRendererProps.tnode.data}
+                {props.defaultRendererProps.tnode.data
+                    ? props.defaultRendererProps.tnode.data
+                    : HTMLEngineUtils.getEmojiTextInsideCode(props.defaultRendererProps.tnode)}
             </Text>
         </TDefaultRenderer>
     );
diff --git a/src/components/InlineCodeBlock/index.native.js b/src/components/InlineCodeBlock/index.native.js
index 8431ce61e..04adad238 100644
--- a/src/components/InlineCodeBlock/index.native.js
+++ b/src/components/InlineCodeBlock/index.native.js
@@ -2,6 +2,7 @@ import React from 'react';
 import styles from '../../styles/styles';
 import WrappedText from './WrappedText';
 import inlineCodeBlockPropTypes from './inlineCodeBlockPropTypes';
+import * as HTMLEngineUtils from '../HTMLEngineProvider/htmlEngineUtils';

 const InlineCodeBlock = (props) => {
     const TDefaultRenderer = props.TDefaultRenderer;
@@ -17,7 +18,9 @@ const InlineCodeBlock = (props) => {
                     styles.codeWordStyle,
                 ]}
             >
-                {props.defaultRendererProps.tnode.data}
+                {props.defaultRendererProps.tnode.data
+                    ? props.defaultRendererProps.tnode.data
+                    : HTMLEngineUtils.getEmojiTextInsideCode(props.defaultRendererProps.tnode)}
             </WrappedText>
         </TDefaultRenderer>
     );

Finally, I thought this improvement should be held waiting for this issue.

tienifr commented 1 year ago

thanks @bernhardoj for the additional insights 💪

I agree there’ll be some fixes here and there to make sure it works 100% in all cases, some of which were mentioned earlier and some you mentioned above.

Once the direction is confirmed we can dive deeper and resolve the remaining (potential) issues.

tienifr commented 1 year ago

@bernhardoj in the case of the 👨‍👨‍👦 emoji, it's indeed split into sub-emojis but it still renders correctly on the screen (since the sub-emojis are in the consecutive span tags and still render the correct composite emoji).

Just curious if you find any case of an emoji that has user-visible problem if it's not matched by your updated regex?

bernhardoj commented 1 year ago

Surprisingly it worked on web, but that's not the case for native platform.

aimane-chnaif commented 1 year ago

opened internal discussion here for ideal solution

MelvinBot commented 1 year ago

@flodnv @anmurali @aimane-chnaif this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

MelvinBot commented 1 year ago

@flodnv @anmurali @aimane-chnaif this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

MelvinBot commented 1 year ago

Upwork job price has been updated to $2000

MelvinBot commented 1 year ago

@flodnv, @anmurali, @aimane-chnaif Whoops! This issue is 2 days overdue. Let's get this updated quick!

flodnv commented 1 year ago

We're still discussing with @aimane-chnaif and @rushatgabhane

anmurali commented 1 year ago

No update over the weekend.

MelvinBot commented 1 year ago

@flodnv @anmurali @aimane-chnaif this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ and will need to go internal. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!

rushatgabhane commented 1 year ago

@aimane-chnaif bump to resolve the slack conversation https://expensify.slack.com/archives/C02NK2DQWUX/p1676299667312469

aimane-chnaif commented 1 year ago

still discussing in slack We can add Help Wanted label back since no acceptable proposal yet. We're still looking for best solution. cc: @flodnv @anmurali

alexxxwork commented 1 year ago

We're still looking for best solution.

I don't have access to internal discussion. If you are still waiting for alternative solutions, could you please explain the point of discussion briefly? It seems that @bernhardoj 's soultion fits - it solves this problem and provides solution for another issue. Or it's just doesn't work on native?

aimane-chnaif commented 1 year ago

We want add a markdown rule that removes emojis from em tag

Markdown: _hello🙂world!_ Current html: <em>hello🙂world!</em> Target html: <em>hello</em>🙂<em>world!</em>

This should be clear solution and won't require any new tags or backend updates. We're going to award this job to someone who adds this rule in markdown->html level.

tienifr commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Emoji has italic style if put in _ _ in markdown.

What is the root cause of that problem?

The emoji inside <em> tag in html has italic style.

What changes do you think we should make in order to solve the problem?

As mentioned by @aimane-chnaif above, we can update the italic rule to exclude the emoji from the em tag. We can do that by adding a post function in the current italic rule:

post: (text) => {
                    return text.replace(/(([\p{Extended_Pictographic}](\u200D[\p{Extended_Pictographic}]|[\u{1F3FB}-\u{1F3FF}]|[\u{E0020}-\u{E007F}]|\uFE0F|\u20E3)*|[\u{1F1E6}-\u{1F1FF}]{2}|[#*0-9]\uFE0F?\u20E3)+)/gu, '</em>$1<em>')
                }

This will replace all emoji group inside the <em>, </em> by '</em>$1<em>', which will achieve the desired behavior.

We can further refine this by removing redundant <em></em> that doesn't have any content inside. This can happen if an emoji happens to be right next to the leading and trailing _.

There'll be further changes needed for other conversion rules (for example html to markdown), but as mentioned above we only need the rule in markdown->html level first.

What alternative solutions did you explore? (Optional)

Another way is instead of the regex above to handle the emojis, we can have a regex to replace all non-emoji character group with <em>$1</em> (the <em> or </em> will be omitted if the group is next to the start or end of the tag)

aimane-chnaif commented 1 year ago

We can further refine this by removing redundant that doesn't have any content inside.

@tienifr I need this solution as well.

For now, your regex doesn't work for me at all. Please try with this:

💯⁦💜🥳
_💯⁦💜🥳_
_💯_ _💜_ _🥳_

Expected result should show like this:

demo

After converting to:

💯⁦💜🥳
💯⁦💜🥳
💯 💜 🥳
tienifr commented 1 year ago

@tienifr I need this solution as well.

@aimane-chnaif sure.

This rule in https://github.com/Expensify/expensify-common/blob/f2fb81231b2a6dbf141fe8b721d779b67c8a9c37/lib/ExpensiMark.js#L134 will work for all the cases you mentioned:

{
                name: 'italic',
                regex: /(?!_blank")\b_((?!\s)[\s\S]*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                replacement: (match, g1) => {
                    const replacedText =  `<em>${g1}</em>`;
                    const textWithEmojiExcluded = replacedText.replace(/(([\p{Extended_Pictographic}](\u200D[\p{Extended_Pictographic}]|[\u{1F3FB}-\u{1F3FF}]|[\u{E0020}-\u{E007F}]|\uFE0F|\u20E3)*|[\u{1F1E6}-\u{1F1FF}]{2}|[#*0-9]\uFE0F?\u20E3)+)/gu, '</em>$1<em>');
                    const textWithEmptyTagRemoved = textWithEmojiExcluded.replace(/<em><\/em>/g, '');
                    return textWithEmptyTagRemoved;
                }
            },

(pls note the change in replacement to handle the replacement steps)

bernhardoj commented 1 year ago

If we do this, can we convert it back to the markdown 100% the same because now we have multiple _?

For example, we convert from this markdown _a🥳b🥳c_ to the html <em>a</em>🥳<em>b</em>🥳<em>c</em>. When we convert it back to the html, we will get _a_🥳_b_🥳_c_.

aimane-chnaif commented 1 year ago

@bernhardoj thanks for raising this concern. After discussion, we agreed markdown should always be consistent between new message and editing message. So it's preferred to fix this in render level. But regarding conversion logic, still suggest to escape emoji from <em> tag than wrapping with new tag.

tienifr commented 1 year ago

Another way is instead of the regex above to handle the emojis, we can have a regex to replace all non-emoji character group with $1 (the or will be omitted if the group is next to the start or end of the tag)

@aimane-chnaif here's another rule for this alternative, at first glance it might look cleaner but it can be more buggy since we're negating an emoji regex and we need to further cover all cases of zero-width string & similar invisible characters.

{

                    name: 'italic',
                    regex: /(?!_blank")\b_((?!\s)[\s\S]*?\S)_\b(?![^<]*(<\/pre>|<\/code>|<\/a>|_blank))/g,
                    replacement: (match, g1) => {
                        return g1.replace(/(((?!([\p{Extended_Pictographic}](\u200D[\p{Extended_Pictographic}]|[\u{1F3FB}-\u{1F3FF}]|[\u{E0020}-\u{E007F}]|\uFE0F|\u20E3)*|[\u{1F1E6}-\u{1F1FF}]{2}|[#*0-9]\uFE0F?\u20E3)+)[^\u200D])+)/gu, '<em>$1</em>');
                    }
}

I think we should go with the rule above, but just putting this here as an option to consider.

aimane-chnaif commented 1 year ago

@tienifr let's say user send 2 new messages _hello_✋_world_ and _hello✋world_ Both are converted to same html: <em>hello</em>✋<em>world</em> When edit message, we should convert html back to markdown. How will you know original markdown?

tienifr commented 1 year ago

@aimane-chnaif in this case we will need to make the htmlToMarkdown conversion consistent with 1 style of putting the emoji. I think it should be converted to _hello✋world_ in both cases. Both renders the same way for the user, so the redundant wrapping around the emoji is safe to be removed.

aimane-chnaif commented 1 year ago

@tienifr if users input _hello_✋_world_, they still wanna see _hello_✋_world_ on edit

tienifr commented 1 year ago

@aimane-chnaif it might not be possible since we're not preserving the order of the markdown in the html. Can you discuss in the Slack thread about this issue to see if we change our mind on which way we want to go? (if we use a custom emoji tag like Github does then we'll not have this problem)

it might not be possible since we're not preserving the order of the markdown in the html

To workaround this I can imagine we can put some special properties in the em tag to indicate this, however it's like a workaround and can be buggy.

aimane-chnaif commented 1 year ago

@tienifr we already discussed. that's why we suggest to fix this in render level

aimane-chnaif commented 1 year ago

@rushatgabhane can you please jump in this discussion since you suggested escape approach?

aimane-chnaif commented 1 year ago

I think all of discussions were based on avoiding backend fix.

bernhardoj commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

We don't want to apply bold and italic style to emoji.

What is the root cause of that problem?

Emoji inside _ and * markdown will receive bold and italic text style.

What changes do you think we should make in order to solve the problem?

  1. ~Change emoji regex to /[\p{Extended_Pictographic}](\u200D[\p{Extended_Pictographic}]|[\u{1F3FB}-\u{1F3FF}]|[\u{E0020}-\u{E007F}]|\uFE0F|\u20E3)*|[\u{1F1E6}-\u{1F1FF}]{2}|[#*0-9]\uFE0F?\u20E3/gu~ The emoji regex is already updated
  2. Modify the text here by wrapping the emoji with </em|b|s>emoji<em|b|s> and remove empty tag. https://github.com/Expensify/App/blob/a4d01a87b242eef06af37db965bd135745e342e5/src/pages/home/report/comment/TextCommentFragment.tsx#L60

Details: We can create a new util function, let's call it unwrapEmojis.

First, we need a new regex constructed from the emoji tag to match all emojis inside bold, italic, or strikethrough tag.

const emojiTagRegex = /<emoji( islarge)?>.*<\/emoji>/g;
const textWithEmojiRegex = RegExp(`<(b|strong|em|i|s|del)>.*(${emojiTagRegex.source})+.*<\\/\\1>`, emojiTagRegex.flags)

Next, we match the whole text with the new regex and then replace the emojis with the tag mentioned above (step 2).

Last, we clean the empty tag.

It will look like this:

function unwrapEmojis(text) {
    const emojiTagRegex = /<emoji( islarge)?>.*<\/emoji>/g;
    const textWithEmojiRegex = RegExp(`<(b|strong|em|i|s|del)>.*(${emojiTagRegex.source})+.*<\\/\\1>`, emojiTagRegex.flags)
    return text.replace(textWithEmojiRegex, (match, g1) => {
        return match.replace(emojiTagRegex, (match) => `</${g1}>${match}<${g1}>`).replace(new RegExp(`<${g1}></${g1}>`, 'g'), '')
    });
}

and we can use it like this:

<RenderCommentHTML
    source={source}
    html={unwrapEmojis(htmlWithTag)}
/>

We will create a test for unwrapEmojis too, example

expect(unwrapEmojis("<b>test 😄 test</b>")).toBeEqual("<b>test </b>😄<b> test</b>")
MelvinBot commented 1 year ago

@flodnv @anmurali @aimane-chnaif this issue is now 4 weeks old and preventing us from maintaining WAQ, can you:

Thanks!