[HOLD for payment 2023-06-09] [$1000] Composer input is empty when editing the word `[block]` in chat

kavimuru commented 1 year ago

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!

Action Performed:

Log into staging.new.expensify.com
Click into any chat
Type the word [block] and send it as a message
Click the pencil icon to edit the comment
Notice the composer input is empty

Expected Result:

The composer input should be text [block] in editing mode

Actual Result:

The composer input is empty

Platforms:

Which of our officially supported platforms is this issue occurring on?

[x] Android / native
[x] Android / Chrome
[x] iOS / native
[x] iOS / Safari
[x] MacOS / Chrome / Safari
[x] MacOS / Desktop

Version Number: 1.3.3 Reproducible in staging?: y Reproducible in production?: y If this was caught during regression testing, add the test name, ID and link from TestRail: Email or phone of affected tester (no customers): Logs: https://stackoverflow.com/c/expensify/questions/4856 Notes/Photos/Videos: Any additional supporting documentation

https://user-images.githubusercontent.com/43996225/233535465-20d83e6a-1279-4ddd-8a77-aaab3ebd04b6.mp4

https://user-images.githubusercontent.com/43996225/233535477-bc59a995-b10c-4ad7-9b52-88df4ffadbcb.mov

Expensify/Expensify Issue URL: Issue reported by: @eh2077 Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1681920140787049

View all open jobs on GitHub

Upwork Automation - Do Not Edit

Upwork Job URL: https://www.upwork.com/jobs/~01a0c0a4619ff68e29
Upwork Job ID: 1650565300774252544
Last Price Increase: 2023-04-24

MelvinBot commented 1 year ago

Triggered auto assignment to @maddylewis (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

MelvinBot commented 1 year ago

Bug0 Triage Checklist (Main S/O)

[x] This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
[x] This bug is not a duplicate report (check E/App issues and #expensify-bugs)
- If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
[x] This bug is reproducible using the reproduction steps in the OP. S/O
- If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
- If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
[x] This issue is filled out as thoroughly and clearly as possible
- Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
[x] I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

MelvinBot commented 1 year ago

@maddylewis Whoops! This issue is 2 days overdue. Let's get this updated quick!

maddylewis commented 1 year ago

im not able to reproduce this -

https://user-images.githubusercontent.com/38016013/234073996-0f680b47-a83b-440d-8d43-595b444f6e13.mp4

maddylewis commented 1 year ago

oh, i see. it's if you literally type the word [block] and then try to edit:

https://user-images.githubusercontent.com/38016013/234074352-8bdae192-a168-4549-8915-1e95dbd82c54.mp4

MelvinBot commented 1 year ago

Triggered auto assignment to @cead22 (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

maddylewis commented 1 year ago

hiya - just confirming that this can be fixed externally before i add the corresponding label!

cead22 commented 1 year ago

@maddylewis yes this can be fixed externally, so you can apply the label. Can you update the issue description to make it extra clear what the steps to reproduce should be, and remove any boilerplate info from there?

MelvinBot commented 1 year ago

Job added to Upwork: https://www.upwork.com/jobs/~01a0c0a4619ff68e29

MelvinBot commented 1 year ago

Current assignee @maddylewis is eligible for the External assigner, not assigning anyone new.

MelvinBot commented 1 year ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @rushatgabhane (External)

eh2077 commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Editing comment [block] shows empty in the composer.

What is the root cause of that problem?

When editing comment [block], we translate it into markdown here

https://github.com/Expensify/App/blob/f50afdfd5fb2c2e9582cd86ee94e61402f33e90e/src/pages/home/report/ReportActionItemMessageEdit.js#L91

through method htmlToMarkdown. The string [block] is used as a special separator to handle line break between markdown block elements. From method replaceBlockWithNewline, we split input htmlString by separator [block] and handle line breaks between markdown block elements. In this case, the input htmlString is equal the the separator [block], so it's treated as the separator and is eliminated to empty string, which results in empty composer in editing mode.

So the root cause of this issue is that we choose string [block] which is a relatively common user input string as the separator to handle line break between markdown block elements.

What changes do you think we should make in order to solve the problem?

To fix this issue, we should avoid to have a separator that possibly conflicts with normal/visual user input text.

We can choose the zero width space character \u200B to wrap string block instead using bracket []. So we use string \u200Bblock\u200B as the separator to handle line break between markdown block elements. We can search string [block] in ExpensiMark.js and replace all with \u200Bblock\u200B.
To fix the special case of pasting html includes block (note \u200B is encoded as  in html), we can replace \u200Bblock\u200B as block before applying html-to-markdown rules here, like
```
generatedMarkdown = generatedMarkdown.replace(/\u200Bblock\u200B/g, 'block');
```
So the separator \u200Bblock\u200B won't conflict with user input during subsequent html-to-markdown rules translation.

Note that we call htmlToMarkdown to convert html into markdown when we pasting html into composer.

The new separator string \u200Bblock\u200B has following advantages

It won't conflict with user input that includes string [block].
It's safe to replace(/\u200Bblock\u200B/g, 'block') before html-to-markdown translation as the zero width character is not visible and removing it can ensure no conflicts with normal user input.
The implementation of method htmlToMarkdown is self-sufficient compared to the alternative solution as described below.

We can also simplify the separator a little bit by using block\u200B.

What alternative solutions did you explore? (Optional)

We can also use separator, like [block] (escaping brackets) as

We always escape user input [block] to &#91;block&#93; before saving it to backend, so see getParsedComment and ExpensiMark.replace
For pasting html case, both [block] and &#91;block&#93; are not conflicted with the separator
For pasting text case, [block] is decoded to [block] here

While method htmlToMarkdown isn't self-sufficient as it requires the potential separator [block] in the input html to be decoded or escaped outside of the method.

MelvinBot commented 1 year ago

Current assignee @cead22 is eligible for the External assigner, not assigning anyone new.

tienifr commented 1 year ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

The composer is empty if editing the text [block] that was sent.

What is the root cause of that problem?

The root cause is from here https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L531, we use [block] as a separator in place of a number of html tags as can be seen in https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L528, then we have logic to replace [block] by new lines in https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L475.

So the problem is the [block] text entered by the user happens to be the one used internally by us.

What changes do you think we should make in order to solve the problem?

A question is: why does it happen for block but doesn't happen if we edit raw html tags like <h1>, <em> that were sent?

That's because we have the logic to escape the special characters in https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L333, so <h1> entered by the user will become <h1>, thus will never collide with the HTML tags that we convert from the markdown.

We should do the same for the [block] special string that we use internally. We can replace all instances of [block] by <block>, it will never collide with the user input because it has special html characters (<, >). It will also look logical because we're replacing a number of legit HTML tags by <block> for further processing in https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L528, so <h1> -> <block> makes more sense to me than [block].

Aside from this, we should also modify the https://github.com/Expensify/expensify-common/blob/3cdaa947fe77016206c15e523017cd50678f2359/lib/str.js#L315 so that it will strip all HTMLs excluding our reserved <block>, so that <block> won't be strip out in here. The function name can be changed to clarify that it won't strip <block>.

What alternative solutions did you explore? (Optional)

As long as we replace [block] by any string that has a HTML special character, it will most likely work. I prefer the <, > pair since it looks logical and already proven since with the current logic we can process raw HTML tags sent by the user without any issue.

Another potential solution is to remove all occurrences of [block] altogether, and we'll split the HTML for processing by using regex ([block] actually represents a group of regex anyway). This will be a bit more complex.

rushatgabhane commented 1 year ago

@eh2077 @tienifr Are there any other similar issues that you can find by looking at the code? Let's fix all of them in one go.

eh2077 commented 1 year ago

Hi @rushatgabhane, I think [block] is the only special string that might conflict with user's input in the expensify-common lib.

tienifr commented 1 year ago

yeah, agree with @eh2077 👍

rushatgabhane commented 1 year ago

@tienifr do you think we'll require any backend changes? does the backend accept <block>?

eh2077 commented 1 year ago

@rushatgabhane One defect I can see about <block> is that it'll be stripped by method Str.stripHTML before being processed in method replaceBlockWithNewLine, see here

        generatedMarkdown = generatedMarkdown.replace(
            /<div.*?>|<\/div>|<comment.*?>|\n<\/comment>|<\/comment>|<h2>|<\/h2>|<h3>|<\/h3>|<h4>|<\/h4>|<h5>|<\/h5>|<h6>|<\/h6>|<p>|<\/p>|<li>|<\/li>/gm,
            '[block]',
        );
        return Str.htmlDecode(this.replaceBlockWithNewLine(Str.stripHTML(generatedMarkdown)));

cc @tienifr

tienifr commented 1 year ago

@eh2077 yeah for that we can strip the HTML but excludes the <block> specifically, then we won’t have that problem

tienifr commented 1 year ago

@rushatgabhane <block> is only used in front end side to replace new lines correctly when convert html to markdown. It won’t be sent to the backend, same as the current [block] that is used

rushatgabhane commented 1 year ago

i agree with the problem and root cause @tienifr your solution seems the most simple to me!

@cead22 let's go for it? 🎀 👀 🎀

rushatgabhane commented 1 year ago

@eh2077 your proposal is clever

but @tienifr's proposal is simpler to implement requires no extra code. so i prefer their's more

eh2077 commented 1 year ago

Hi @rushatgabhane, would you like to have a look at my comment about the drawback of the <block> solution here https://github.com/Expensify/App/issues/17776#issuecomment-1526606921?

IMHO, I don’t think it’ll be working. Using <block> depends on the well-encoded input html string, so it’s not a robust solution from the beginning.

cc @cead22

tienifr commented 1 year ago

Using depends on the well-encoded input html string, so it’s not a robust solution from the beginning.

All our HTML tags decoding relies on well-encoded html string. Without well-encoded html string, the premise of decoding HTML to markdown does not work, so I don’t think it indicates the solution is not robust.

eh2077 commented 1 year ago

@tienifr I think first you need to change the other basic method Str.stipHTML to get it work, which is not simple and is not a good pattern to fix an issue.

We can copy and paste html from web page outside of the App. So the robustness matters.

tienifr commented 1 year ago

I think first you need to change the other basic method Str.stipHTML to get it work

Yes I've mentioned it earlier in here. Updated to the original proposal to clarify. It’s only a simple change.

We can copy and paste html from web page outside of the App. So the robustness matters.

@eh2077 any HTML or text from inside or outside the app will be HTML encoded before being processed by our html-to-markdown rules. If you find any case currently in our app where it won't work I'd love to hear also.

cead22 commented 1 year ago

@rushatgabhane I like @eh2077 's solution better. To me it's simpler than @tienifr 's because it's self contained (it doesn't require updating Str.stripHTML), and I actually like the approach of using a string like \u200Bblock\u200B over <block> because in my opinion it's more immediately obvious that it's an arbitrary string used for replacing text and not something that's likely to be entered by a user or some form of HTML

tienifr commented 1 year ago

@cead22 @eh2077 there’s 1 issue with using such string though, if the user input contains it (it can happen when the user copies it from other sites), we’ll have to strip it. Even though it’s a zero width string, I don’t think we should modify the user’s input, we should leave it as is.

Regarding the having to modify the stripHTML in the <block> solution, we can easily avoid that by using another pair of special HTML entities (as was mentioned as an alternative in my original proposal), I only suggested <block> since it looks more neat and readable. If our goal is to use an obviously-different string, we can use another pair like >block>. It satisfies:

not having to modify “stripHTML”
it's more immediately obvious that it's an arbitrary string used for replacing text
not something that can ever be entered by a user or some form of HTML (unlike the zero width string which can be copied from other sources)
not having to use the workaround in the case of user inputs contains that exact string (which is step 2 in @eh2077 ’s proposal )

@cead22 do you think that it’d be simpler?

cc @rushatgabhane

cead22 commented 1 year ago

Candidly, I still like using \u200Bblock\u200B better then <block<, and I think we can skip step 2. in @eh2077 's proposal, because I don't think it's realistic that \u200Bblock\u200B will be copied from other places

tienifr commented 1 year ago

@cead22 the zero width space is widely used though, especially in other languages that don’t use spaces to separate words (see here).

The issue in this case ([block] in the message) is already a rare-occuring one and only occurs for that specific string combination. So I’m afraid if we do that, we’re replacing a rare bug by introducing another rarer bug rather than fixing it completely.

cead22 commented 1 year ago

@cead22 the zero width space is widely used though, especially in other languages that don’t use spaces to separate words (see here).

Interesting, I didn't know that! That said, is it typical for those languages to have zero width space, then block, and then zero width space? It seems very unlikely to me that we'll encounter this situation, and I think it's more likely to find <block<.

If we think it's not that unlikely, then let's pick something different from \u200Bblock\u200B and <block<, maybe a rare unicode character, then "block", then another rare unicode character.

eh2077 commented 1 year ago

I found another possible solution to fix this issue. We can generate a id as the block separator instead of having a fixed one.

To achieve this we can initialise a separator this.BLOCK_SEPARATOR inside the constructor of ExpensiMark, like

export default class ExpensiMark {
    constructor() {
        /**
         * We generate a special id as the separator between markdown block elements to avoid 
         * conflicting with user input.
         */
        this.BLOCK_SEPARATOR = Str.guid('block');

And then replace all [block] with this.BLOCK_SEPARATOR.

The BLOCK_SEPARATOR will look like blockc41de41b-2767-617b-b78d-49ca52a0fb0c which won't conflict with user input.

@cead22 @rushatgabhane I'll be happy to hear your feedback about this new solution.

cead22 commented 1 year ago

We can make the separator a UUID to make it extremely unlikely that we'll find it anywhere else. However

It feels like overkill, I bet \u200Bblock\u200B is random enough, or that we can come up with a random-enough string that's not a UUID
I think we should make it static to make this predictable
If we use a UUID, I think we should remove the "block" prefix, cause I don't think it serves any purpose

tienifr commented 1 year ago

It seems very unlikely to me that we'll encounter this situation, and I think it's more likely to find <block<.

@cead22 even though the user input contains >block>, it will never conflict with our >block> because user input is always properly encoded, which is why our HTML to markdown conversion works. So >block> in the user input will become >block>. So using >block> is already a complete solution without any edge case, it will address the problem in full, while for other strings there'll still be some edge cases that will conflict with user input.

@cead22 just to confirm you don't prefer >block> over \u200Bblock\u200B mainly because >block> doesn't look random enough or because of any other reason?

tienifr commented 1 year ago

@cead22 @rushatgabhane Actually I find this way which can save us all the trouble of inventing a random string.

The [block] is currently used basically to represent a group of HTML tags that, when converted to markdown, should add a new line, see https://github.com/Expensify/expensify-common/blob/afe46976cbc7c5966118a32ae2e3737109d849b4/lib/ExpensiMark.js#L545. And then, we split the htmlString by [block] in here https://github.com/Expensify/expensify-common/blob/afe46976cbc7c5966118a32ae2e3737109d849b4/lib/ExpensiMark.js#L493 for replacing with new lines.

We can remove [block] completely as an intermediary, and split the string directly by using the regex in https://github.com/Expensify/expensify-common/blob/afe46976cbc7c5966118a32ae2e3737109d849b4/lib/ExpensiMark.js#L545 There'll be a few more small changes required like updating the h1 replacing logic, stripHTML for an array of strings that are already split, but it will work.

@cead22 @rushatgabhane if there's interest in this solution, I'm happy to create a full proposal for it. It might take more changes but will solve the problem fully and avoid any random string to be used.

cead22 commented 1 year ago

@cead22 even though the user input contains <block<, it will never conflict with our <block< because user input is always properly encoded, which is why our HTML to markdown conversion works. So <block< in the user input will become <block<

What if the user input contains <block<?

@cead22 just to confirm you don't prefer <block< over \u200Bblock\u200B mainly because <block< doesn't look random enough or because of any other reason?

I just don't see any advantages to using <block< over \u200Bblock\u200B

tienifr commented 1 year ago

What if the user input contains <block<?

@cead22 then the & will be encoded. And the string will become: &lt;block&lt;, it will not collide with <block<. It’s impossible for any conflict because the input string (after encoded) will never contain <.

I just don't see any advantages to using <block< over \u200Bblock\u200B

@cead22 basically >block> doesn’t have any edge case where it won’t work, while \u200Bblock\u200B does as pointed out above. Would you see that as an advantage?

eh2077 commented 1 year ago

@tienifr I’m not sure if I’m missing something but <block< doesn’t work either. Please check the regex of method stripHTML or see below screenshot A460FF22-6A26-4912-B079-FE20E1E04CE1

rushatgabhane commented 1 year ago

https://github.com/Expensify/App/issues/17776#issuecomment-1533894871

We can remove [block] completely

@tienifr that looks promising to me! Please post a proposal. We just need to make sure all unit tests work fine

tienifr commented 1 year ago

@tienifr I’m not sure if I’m missing something but <block< doesn’t work either. Please check the regex of method stripHTML or see below screenshot

@eh2077 >block> will work just fine (two >, not <), feel free to try it out.

@tienifr that looks promising to me! Please post a proposal. We just need to make sure all unit tests work fine

@rushatgabhane Thanks for your interest! I'll post a full proposal for that tomorrow

cead22 commented 1 year ago

What if the user input contains <block<?

@cead22 then the & will be encoded. And the string will become: &lt;block&lt;, it will not collide with <block<. It’s impossible for any conflict because the input string (after encoded) will never contain <.

I just don't see any advantages to using <block< over \u200Bblock\u200B

@cead22 basically >block> doesn’t have any edge case where it won’t work, while \u200Bblock\u200B does as pointed out above. Would you see that as an advantage?

Yes 👍 I didn't realize that would be the case for <block<, thanks for clarifying

melvin-bot[bot] commented 1 year ago

@cead22 @rushatgabhane @maddylewis this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

maddylewis commented 1 year ago

how are we looking on this one? when do we think a PR will be up? lmk - ty!

tienifr commented 1 year ago

@tienifr that looks promising to me! Please post a proposal

@rushatgabhane @cead22 since this is a fairly different approach, I'm posting it as a new comment to be clear

Proposal

Please re-state the problem that we are trying to solve in this issue.

The composer is empty if editing the text [block] that was sent.

What is the root cause of that problem?

The root cause is from here https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L531, we use [block] as a separator in place of a number of html tags as can be seen in https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L528, then we have logic to replace [block] by new lines in https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L475.

So the problem is the [block] text entered by the user happens to be the one used internally by us.

What changes do you think we should make in order to solve the problem?

We can remove [block] completely as an intermediary, and split the string directly by using the regex in https://github.com/Expensify/expensify-common/blob/afe46976cbc7c5966118a32ae2e3737109d849b4/lib/ExpensiMark.js#L545

That means updating https://github.com/Expensify/expensify-common/blob/e93e1eb448ad6bdbde911fd6239f70d5e749635e/lib/ExpensiMark.js#L476 to

let splitText = htmlString.split(/<div.*?>|<\/div>|<comment.*?>|\n<\/comment>|<\/comment>|<h1>|<\/h1>|<h2>|<\/h2>|<h3>|<\/h3>|<h4>|<\/h4>|<h5>|<\/h5>|<h6>|<\/h6>|<p>|<\/p>|<li>|<\/li>/);

(please note there's the added <h1>|<\/h1>| compared to the original regex, that's related to 2)

and removing the replacing with [block] logic in https://github.com/Expensify/expensify-common/blob/afe46976cbc7c5966118a32ae2e3737109d849b4/lib/ExpensiMark.js#L544

In https://github.com/Expensify/expensify-common/blob/cb5836992710d910ab64f33895052074abdaa357/lib/ExpensiMark.js#L239 we're replacing <h1> with [block], we don't need to do that since we're already splitting by <h1> in step 1. We can update that line to <h1># $2</h1>
We need to add splitText = splitText.map(text => Str.stripHTML(text)); to below the let splitText = to strip the remaining HTML, and remove the stripHTML logic in https://github.com/Expensify/expensify-common/blob/afe46976cbc7c5966118a32ae2e3737109d849b4/lib/ExpensiMark.js#L548

Those would be the main changes required, any other edge (if any) case can be handled later if this approach is selected.

What alternative solutions did you explore? (Optional)

The original proposal will also work.

cead22 commented 1 year ago

@rushatgabhane let me know what you think. The expensimark solution seems more robust, but also more likely to create regressions

rushatgabhane commented 1 year ago

@cead22 we have plenty of unit tests in expensimark. I think we should go ahead with @tienifr's proposal because it's a better solution as it gets rid of the problem itself.

Regressions shouldn't stop us. We'll take care of it in the PR, and in the worst case expensify-bugs channel is pretty fast to spot anything.

eh2077 commented 1 year ago

@cead22 we have plenty of unit tests in expensimark. I think we should go ahead with @tienifr's proposal because it's a better solution as it gets rid of the problem itself.

Regressions shouldn't stop us. We'll take care of it in the PR, and in the worst case expensify-bugs channel is pretty fast to spot anything.

@rushatgabhane Based on the expenerience I have worked on several line break issues related to the Expensify-common lib, I would say @tienifr 's last proposal is ambitious but is easy to introduce new regressions.

maddylewis commented 1 year ago

communicating here that i am OOO May 11 and 12. i am adding the BZ label in case this issue needs to be actioned while I'm away, but will keep myself assigned / remove the other assignee once i return from OOO.

melvin-bot[bot] commented 1 year ago

Triggered auto assignment to @stephanieelliott (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

Expensify / App