Expensify / App

Welcome to New Expensify: a complete re-imagination of financial collaboration, centered around chat. Help us build the next generation of Expensify by sharing feedback and contributing to the code.
https://new.expensify.com
MIT License
3.44k stars 2.81k forks source link

[$500] [Live Markdown] Copying and pasting from google docs comes over as bold #41109

Closed thienlnam closed 2 months ago

thienlnam commented 5 months ago
  1. In a google doc, highlight and copy some regular text
  2. Paste it into the composer

Actual image image

Expectation It gets pasted as regular text

cc @tomekzaw

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~0181c168843a4dc29c
  • Upwork Job ID: 1783979270880542720
  • Last Price Increase: 2024-05-20
  • Automatic offers:
    • alitoshmatov | Reviewer | 102487686
    • ShridharGoel | Contributor | 102487688
Issue OwnerCurrent Issue Owner: @abekkala
melvin-bot[bot] commented 5 months ago

Triggered auto assignment to @abekkala (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.

melvin-bot[bot] commented 5 months ago

Job added to Upwork: https://www.upwork.com/jobs/~0181c168843a4dc29c

melvin-bot[bot] commented 5 months ago

Triggered auto assignment to Contributor-plus team member for initial proposal review - @alitoshmatov (External)

josh-prof commented 5 months ago

@thienlnam Is it okay to prevent all formatting when paste text?

thienlnam commented 5 months ago

We'd like to retain the formatting that it was copied with, instead of preventing all formatting at all

jainilparikh commented 5 months ago

@thienlnam , isn't this the expected behaviour ? If the text was bold on Google docs, we should send it as bold in our chat too.

The composer is showing text since it's allowing the end user to choose whether they want to continue with the formatting that their copied text used on Google docs or whether they want to change it. In-case they want to change it, they can use markdown (which the composer supports) to change the formatting.

melvin-bot[bot] commented 5 months ago

πŸ“£ @jainilparikh! πŸ“£ Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:

  1. Make sure you've read and understood the contributing guidelines.
  2. Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
  3. Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
  4. Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details. Screen Shot 2022-11-16 at 4 42 54 PM Format:
    Contributor details
    Your Expensify account email: <REPLACE EMAIL HERE>
    Upwork Profile Link: <REPLACE LINK HERE>
jainilparikh commented 5 months ago

Contributor details Your Expensify account email: jainilvparikh@gmail.com Upwork Profile Link: https://www.upwork.com/freelancers/~01845f58bb8fa86b29

melvin-bot[bot] commented 5 months ago

βœ… Contributor details stored successfully. Thank you for contributing to Expensify!

skyweb331 commented 5 months ago

@thienlnam When I copy "Test123" from google doc, it is translated into HTML like this.

<html>
<body>
<!--StartFragment--><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-ef0d09b9-7fff-2c97-04c0-4fe5952c2707"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">TEST</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> 12</span></b><!--EndFragment-->
</body>
</html>

Too much complicated, but simply it is <b><span style="font-weight: 700">Test</span><span style="font-weight:400">123</span></b>

But, ExpensiMark does not detect the font-weight, it is detecting <b> tag, that's why pasted text is parsed into *Test 123* in our editor.

As I mentioned in https://github.com/Expensify/App/issues/41110#issuecomment-2080034830, this could not be done by regex. Regex has its limitation and could not parse all these complex html. ( I finally find the exact example that regex could not fit for every cases ). Do you think, we can catch css styles using regex? Maybe, Yes. But it is too much complicated.

So my proposal is to change the HTML parsing logic like using xml2js or fast-xml-parser (https://github.com/Expensify/App/issues/40571#issuecomment-2070080632)

This way, we can completely resolve the root problem.

(cc @jjcoffee )

tomekzaw commented 5 months ago

@skyweb331 We could just create DOM subtree for the pasted HTML and then get pure text using the following approach: https://stackoverflow.com/a/6743966/23325954

skyweb331 commented 5 months ago

@tomekzaw Yes. If we use DOM, solution is really simple. I also recommended to use DOM, but on mobile device, DOM parsing is available but slow...That's why ExpensiMark is using regex. xml2js and fast-xml-parser is not using DOM and I recommend it. ( I had discussion with @ikevin127 for this. https://expensify.slack.com/archives/C01GTK53T8Q/p1713721057329569 )

melvin-bot[bot] commented 5 months ago

@abekkala, @thienlnam, @alitoshmatov Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

thienlnam commented 5 months ago

Could you put a post in #expensify-open-source about the problem and the solution solved by using another parser so we can get some more eyes on it?

skyweb331 commented 5 months ago

@thienlnam https://expensify.slack.com/archives/C01GTK53T8Q/p1713721057329569 I already posted this and discussed with Kevin Bader. We have all parsing Markdown to HTML and vice-versa in One file ExpensiMark and using regex for everything. It is not well-structured and hard to maintain...

melvin-bot[bot] commented 5 months ago

πŸ“£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πŸ’Έ

ShridharGoel commented 5 months ago

Proposal

Please re-state the problem that we are trying to solve in this issue.

Copying and pasting non-bold text from Google docs gets pasted as bold text.

What is the root cause of that problem?

Google docs adds some extra HTML values when copying text.

Example, when copying Testing, the below is the copied HTML:

<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-1138e939-7fff-aca6-792e-add7dee8781e"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Testing</span></b>

When copying Testing which is in bold in Google docs, below is the copied HTML:

<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-c0d177b2-7fff-5eed-41bb-35f0cf5b210e"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Testing</span></b>

The font-weight is 400 in the 1st example, while it is 700 in the 2nd.

But, in both cases, the outer HTML is wrapped in <b></b> because of which ExpensiMark takes it to be a bold text.

https://github.com/Expensify/expensify-common/blob/1713f28214f0e7176c4fd13433fb0ea15491ebf9/lib/ExpensiMark.js#L408-L412

What changes do you think we should make in order to solve the problem?

Update ExpensiMark logic for bold to the below:

{
    name: 'bold',
    regex: /<(b|strong)(?:"[^"]*"|'[^']*'|[^'">])*>([\s\S]*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
    replacement: (match, tagContent, innerContent) => {
        const isFontWeightBold = () => {
            return (fontWeightMatch[1].replaceAll(/\s/g,'').includes('font-weight:bold;') || fontWeightMatch[1].replaceAll(/\s/g,'').includes('font-weight:700;'))
        }
        const fontWeightMatch = innerContent.match(/style="(.*?)"/);
        const isBold = fontWeightMatch ? (isFontWeightBold() ? true : false) : true;
        return isBold ? `*${innerContent}*` : innerContent;
    },
},

If the inner content has font-weight style, then this checks if the font-weight of the span element is set to bold or 700.

If yes, then it uses the bold markdown, else it uses it as a normal text. We can polish the logic and code further.

melvin-bot[bot] commented 5 months ago

@abekkala, @thienlnam, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...

abekkala commented 5 months ago

@alitoshmatov can you review the proposal above please?

alitoshmatov commented 5 months ago

Thank you for your proposal @ShridharGoel while your RCA is correct I don't think you solution is optimal one

skyweb331 commented 5 months ago

To prevent future issues and improve the logic easily, ExpensiMark should change its codebase.

Google doc is using font-weight attributes to make it bold, we can implement the logic above, but what if Slack or others' are using different tags or different attributes? Maybe, the regex pattern will be more complicated and at one point, no longer can be updated...

Solution is to clean up current ExpensiMark structure...

ShridharGoel commented 5 months ago

while your RCA is correct I don't think you solution is optimal one

Can you tell what can be improved?

ShridharGoel commented 5 months ago

Google doc is using font-weight attributes to make it bold, we can implement the logic above, but what if Slack or others' are using different tags or different attributes?

What can be the other attributes for this? I think the above mentioned above will cover most of the cases.

melvin-bot[bot] commented 4 months ago

πŸ“£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πŸ’Έ

melvin-bot[bot] commented 4 months ago

@abekkala @thienlnam @alitoshmatov this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

melvin-bot[bot] commented 4 months ago

@abekkala, @thienlnam, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...

melvin-bot[bot] commented 4 months ago

Upwork job price has been updated to $500

thienlnam commented 4 months ago

Increasing the bounty to get some more eyes

skyweb331 commented 4 months ago

@thienlnam This is not related to bounty problem... It requires ExpensiMark rebuild.

thienlnam commented 4 months ago

I understand your concerns about parsing HTML with regex and your suggestion to rebuild ExpensiMark. However, without a detailed proposal, it's challenging to evaluate the necessity and feasibility of such a significant change.

If you believe a rebuild is the best solution, could you please provide a formal proposal? This should include things like:

@alitoshmatov Could you respond to this comment? https://github.com/Expensify/App/issues/41109#issuecomment-2101316189

abekkala commented 4 months ago

@alitoshmatov could you respond to this comment? https://github.com/Expensify/App/issues/41109#issuecomment-2101316189

alitoshmatov commented 4 months ago

Can you tell what can be improved?

@ShridharGoel To be honest I am not sure. I am just worried if this will cover all cases, I mean based on your solution if I copy the bold tag without any styles will it be parsed as bold. What would be our requirement for any element to be parsed as bold

ShridharGoel commented 4 months ago

The replacement method in my proposal extracts any inline style attribute from the tag content and checks if it contains font-weight:bold; or font-weight:700; which are the two scenarios when we want bold style if the inner content has font-weight style. I think this should include the scenarios that are needed for bold style.

ShridharGoel commented 4 months ago

@thienlnam Any thoughts on the above comment?

Proposal link

abekkala commented 4 months ago

@thienlnam do you have any input on the proposal and the comments after?

melvin-bot[bot] commented 4 months ago

πŸ“£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πŸ’Έ

melvin-bot[bot] commented 4 months ago

@abekkala, @thienlnam, @alitoshmatov Whoops! This issue is 2 days overdue. Let's get this updated quick!

melvin-bot[bot] commented 4 months ago

@abekkala @thienlnam @alitoshmatov this issue is now 4 weeks old, please consider:

Thanks!

alitoshmatov commented 4 months ago

Okay let's go with @ShridharGoel 's proposal. Which parses text in one of these two scenarios:

  1. The copied tag is <b> and doesn't have any font-weight attributes
  2. The copied tag is <b> and has font-weight attribute with values of either bold or 700

C+ reviewed πŸŽ€ πŸ‘€ πŸŽ€

melvin-bot[bot] commented 4 months ago

Triggered auto assignment to @marcochavezf, see https://stackoverflow.com/c/expensify/questions/7972 for more details.

marcochavezf commented 4 months ago

Sounds good, thanks @alitoshmatov for the review. Assigning @ShridharGoel πŸš€

melvin-bot[bot] commented 4 months ago

πŸ“£ @alitoshmatov πŸŽ‰ An offer has been automatically sent to your Upwork account for the Reviewer role πŸŽ‰ Thanks for contributing to the Expensify app!

Offer link Upwork job

melvin-bot[bot] commented 4 months ago

πŸ“£ @ShridharGoel πŸŽ‰ An offer has been automatically sent to your Upwork account for the Contributor role πŸŽ‰ Thanks for contributing to the Expensify app!

Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review πŸ§‘β€πŸ’» Keep in mind: Code of Conduct | Contributing πŸ“–

marcochavezf commented 4 months ago

Not overdue, contributor assigned

ShridharGoel commented 4 months ago

https://github.com/Expensify/expensify-common/pull/710

melvin-bot[bot] commented 4 months ago

@marcochavezf, @abekkala, @ShridharGoel, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...

marcochavezf commented 4 months ago

Update: PR in review

abekkala commented 4 months ago

Not overdue - PR is in review

melvin-bot[bot] commented 4 months ago

@marcochavezf, @abekkala, @ShridharGoel, @alitoshmatov 6 days overdue. This is scarier than being forced to listen to Vogon poetry!

marcochavezf commented 4 months ago

@ShridharGoel in working on the PR