Closed thienlnam closed 2 months ago
Triggered auto assignment to @abekkala (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.
Job added to Upwork: https://www.upwork.com/jobs/~0181c168843a4dc29c
Triggered auto assignment to Contributor-plus team member for initial proposal review - @alitoshmatov (External
)
@thienlnam Is it okay to prevent all formatting when paste text?
We'd like to retain the formatting that it was copied with, instead of preventing all formatting at all
@thienlnam , isn't this the expected behaviour ? If the text was bold on Google docs, we should send it as bold in our chat too.
The composer is showing text since it's allowing the end user to choose whether they want to continue with the formatting that their copied text used on Google docs or whether they want to change it. In-case they want to change it, they can use markdown (which the composer supports) to change the formatting.
π£ @jainilparikh! π£ Hey, it seems we donβt have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork. Please follow these steps:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>
Contributor details Your Expensify account email: jainilvparikh@gmail.com Upwork Profile Link: https://www.upwork.com/freelancers/~01845f58bb8fa86b29
β Contributor details stored successfully. Thank you for contributing to Expensify!
@thienlnam When I copy "Test123" from google doc, it is translated into HTML like this.
<html>
<body>
<!--StartFragment--><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-ef0d09b9-7fff-2c97-04c0-4fe5952c2707"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">TEST</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> 12</span></b><!--EndFragment-->
</body>
</html>
Too much complicated, but simply it is
<b><span style="font-weight: 700">Test</span><span style="font-weight:400">123</span></b>
But, ExpensiMark
does not detect the font-weight, it is detecting <b>
tag, that's why pasted text is parsed into *Test 123*
in our editor.
As I mentioned in https://github.com/Expensify/App/issues/41110#issuecomment-2080034830, this could not be done by regex. Regex has its limitation and could not parse all these complex html. ( I finally find the exact example that regex could not fit for every cases ). Do you think, we can catch css styles using regex? Maybe, Yes. But it is too much complicated.
So my proposal is to change the HTML parsing logic like using xml2js
or fast-xml-parser
(https://github.com/Expensify/App/issues/40571#issuecomment-2070080632)
This way, we can completely resolve the root problem.
(cc @jjcoffee )
@skyweb331 We could just create DOM subtree for the pasted HTML and then get pure text using the following approach: https://stackoverflow.com/a/6743966/23325954
@tomekzaw
Yes. If we use DOM, solution is really simple. I also recommended to use DOM, but on mobile device, DOM parsing is available but slow...That's why ExpensiMark
is using regex. xml2js
and fast-xml-parser
is not using DOM and I recommend it.
( I had discussion with @ikevin127 for this. https://expensify.slack.com/archives/C01GTK53T8Q/p1713721057329569 )
@abekkala, @thienlnam, @alitoshmatov Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!
Could you put a post in #expensify-open-source about the problem and the solution solved by using another parser so we can get some more eyes on it?
@thienlnam https://expensify.slack.com/archives/C01GTK53T8Q/p1713721057329569
I already posted this and discussed with Kevin Bader.
We have all parsing Markdown to HTML and vice-versa in One file ExpensiMark
and using regex for everything. It is not well-structured and hard to maintain...
π£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πΈ
Copying and pasting non-bold text from Google docs gets pasted as bold text.
Google docs adds some extra HTML values when copying text.
Example, when copying Testing
, the below is the copied HTML:
<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-1138e939-7fff-aca6-792e-add7dee8781e"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Testing</span></b>
When copying Testing
which is in bold in Google docs, below is the copied HTML:
<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-c0d177b2-7fff-5eed-41bb-35f0cf5b210e"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Testing</span></b>
The font-weight is 400 in the 1st example, while it is 700 in the 2nd.
But, in both cases, the outer HTML is wrapped in <b></b>
because of which ExpensiMark
takes it to be a bold text.
Update ExpensiMark logic for bold to the below:
{
name: 'bold',
regex: /<(b|strong)(?:"[^"]*"|'[^']*'|[^'">])*>([\s\S]*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
replacement: (match, tagContent, innerContent) => {
const isFontWeightBold = () => {
return (fontWeightMatch[1].replaceAll(/\s/g,'').includes('font-weight:bold;') || fontWeightMatch[1].replaceAll(/\s/g,'').includes('font-weight:700;'))
}
const fontWeightMatch = innerContent.match(/style="(.*?)"/);
const isBold = fontWeightMatch ? (isFontWeightBold() ? true : false) : true;
return isBold ? `*${innerContent}*` : innerContent;
},
},
If the inner content has font-weight
style, then this checks if the font-weight of the span element is set to bold or 700.
If yes, then it uses the bold markdown, else it uses it as a normal text. We can polish the logic and code further.
@abekkala, @thienlnam, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...
@alitoshmatov can you review the proposal above please?
Thank you for your proposal @ShridharGoel while your RCA is correct I don't think you solution is optimal one
To prevent future issues and improve the logic easily, ExpensiMark
should change its codebase.
Google doc is using font-weight
attributes to make it bold, we can implement the logic above, but what if Slack or others' are using different tags or different attributes? Maybe, the regex pattern will be more complicated and at one point, no longer can be updated...
Solution is to clean up current ExpensiMark
structure...
while your RCA is correct I don't think you solution is optimal one
Can you tell what can be improved?
Google doc is using font-weight attributes to make it bold, we can implement the logic above, but what if Slack or others' are using different tags or different attributes?
What can be the other attributes for this? I think the above mentioned above will cover most of the cases.
π£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πΈ
@abekkala @thienlnam @alitoshmatov this issue was created 2 weeks ago. Are we close to approving a proposal? If not, what's blocking us from getting this issue assigned? Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!
@abekkala, @thienlnam, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...
Upwork job price has been updated to $500
Increasing the bounty to get some more eyes
@thienlnam This is not related to bounty problem... It requires ExpensiMark
rebuild.
I understand your concerns about parsing HTML with regex and your suggestion to rebuild ExpensiMark. However, without a detailed proposal, it's challenging to evaluate the necessity and feasibility of such a significant change.
If you believe a rebuild is the best solution, could you please provide a formal proposal? This should include things like:
@alitoshmatov Could you respond to this comment? https://github.com/Expensify/App/issues/41109#issuecomment-2101316189
@alitoshmatov could you respond to this comment? https://github.com/Expensify/App/issues/41109#issuecomment-2101316189
Can you tell what can be improved?
@ShridharGoel To be honest I am not sure. I am just worried if this will cover all cases, I mean based on your solution if I copy the bold tag without any styles will it be parsed as bold. What would be our requirement for any element to be parsed as bold
The replacement method in my proposal extracts any inline style attribute from the tag content and checks if it contains font-weight:bold;
or font-weight:700;
which are the two scenarios when we want bold style if the inner content has font-weight style. I think this should include the scenarios that are needed for bold style.
@thienlnam Any thoughts on the above comment?
@thienlnam do you have any input on the proposal and the comments after?
π£ It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? πΈ
@abekkala, @thienlnam, @alitoshmatov Whoops! This issue is 2 days overdue. Let's get this updated quick!
@abekkala @thienlnam @alitoshmatov this issue is now 4 weeks old, please consider:
Thanks!
Okay let's go with @ShridharGoel 's proposal. Which parses text in one of these two scenarios:
<b>
and doesn't have any font-weight
attributes<b>
and has font-weight
attribute with values of either bold
or 700
C+ reviewed π π π
Triggered auto assignment to @marcochavezf, see https://stackoverflow.com/c/expensify/questions/7972 for more details.
Sounds good, thanks @alitoshmatov for the review. Assigning @ShridharGoel π
π£ @alitoshmatov π An offer has been automatically sent to your Upwork account for the Reviewer role π Thanks for contributing to the Expensify app!
π£ @ShridharGoel π An offer has been automatically sent to your Upwork account for the Contributor role π Thanks for contributing to the Expensify app!
Offer link Upwork job Please accept the offer and leave a comment on the Github issue letting us know when we can expect a PR to be ready for review π§βπ» Keep in mind: Code of Conduct | Contributing π
Not overdue, contributor assigned
@marcochavezf, @abekkala, @ShridharGoel, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...
Update: PR in review
Not overdue - PR is in review
@marcochavezf, @abekkala, @ShridharGoel, @alitoshmatov 6 days overdue. This is scarier than being forced to listen to Vogon poetry!
@ShridharGoel in working on the PR
Actual
Expectation It gets pasted as regular text
cc @tomekzaw
Upwork Automation - Do Not Edit
Issue Owner
Current Issue Owner: @abekkala