OfficeDev / office-js

A repo and NPM package for Office.js, corresponding to a copy of what gets published to the official "evergreen" Office.js CDN, at https://appsforoffice.microsoft.com/lib/1/hosted/office.js.
https://learn.microsoft.com/javascript/api/overview
Other
679 stars 95 forks source link

context.document.body.text differs depending on whether revisions are shown as balloon or inline #2314

Closed andreas-schoch closed 5 months ago

andreas-schoch commented 2 years ago

User settings influence what context.document.body.text returns in a Word add-in.

Your Environment

Expected behavior

No matter whether user has tracked changes displayed as inline or balloons, context.document.body.text (or another exposed property which can be loaded) should return the same text which includes deletions/insertions.

Current behavior

When user selects Show Markup --> Balloons --> Show Revisions in Balloons, revisions are not reflected in body.text When user selects Show Markup --> Balloons --> Show All Revisions Inline, revisions are reflected in body.text.

Steps to reproduce

Console.log the value of context.document.body.text and compare the difference when Show Markup --> Balloons --> Show Revisions in Balloons vs. Show All Revisions Inline

Context

I am unsure whether this inconsistency is intentional or a bug, but it makes working with the document content unnecessarily difficult.

As a workaround we now have to parse the Ooxml into a string ourselves which has quite an overhead. body.getOoxml() was about ~6x slower compared to body.text when I last compared it on a windows desktop.

At a minimum I would have expected the office-js api to either:

Apologies in case I missed something and it already is possible to get the insertions/deletions as text without having to parse the ooxml. Please let me know if there is another way 🙏

greysteil commented 1 year ago

Just ran into this with my app. Big ➕ to the request to get a document's text independent of the way revisions are being shown in the doc. Our flow is as follows:

  1. Extract the document's text (with context.document.body.text)
  2. Split that text up into things we're interested in
  3. Search through the document for the pieces we're interested in using context.document.body.search(...) and store their ranges

Unfortunately, whilst the return value of context.document.body.text depends on the way revisions are being shown in the doc, context.document.body.search doesn't. As a result, if the user was viewing markup in balloons when the extraction (1) occurred, and there was markup in one of the things we subsequently want to search for (3) we have a mismatch between the doc and what we're searching for.

I've noticed that context.document.body.text.getReviewedText() is independent of the view, which is great, but unfortunately it's not the text that context.document.body.search(...) uses (which appears to be the text that context.document.body.text returns if displaying revisions inline?).

Update:

chad-levesque commented 6 months ago

Any updates on this? This bug is absolutely killing us

We can't use getReviewedText because it freezes and causes severe document scrolling, and we can't rely on paragraph.text because of this Show Revisions Inline issue...

getReviewedText-inline-bug

wangyun-microsoft commented 5 months ago

@andreas-schoch can you please see if body.getReviewedText() can solve your problem?

@chad-levesque , we are working on getReviewedText issues (freezes document and scrolling). We will use other issues to track them.

@greysteil Can you open a separate issue for search() if you think it is still a problem for you?

greysteil commented 5 months ago

@greysteil Can you open a separate issue for search() if you think it is still a problem for you?

There's a separate issue with search(...) - it scans the body's text as though all text where present, so "hello~goodbye~ my old friend" will appear as "hellogoodbye my old friend", and searching for "hello my old friend" won't find it

I no longer think of this as an issue. It's not super intuitive, but it matches the way search in Word works for end users, so I don't think it's an OfficeJS issue. I also now think the present behaviour of context.document.body.text is helpful, as it implicitly reveals what display mode a user is in, and provides the text to search for. (My original comment about the text to search for was wrong - context.document.body.text is mostly the base text for searching over.)

dorinionut commented 5 months ago

Going back to the initial problem, how can one search for text ignoring the track changes?

I am analysing the document text in the Backend and I get back sections of text which I need to highlight (using content controls). In order to find the ranges / places where the content control needs to be created, I do a search by the text I receive from the BE and it's not found by Word, because the new text doesn't include the deleted parts, but Word does include them.

How can I go around this?

microsoft-github-policy-service[bot] commented 5 months ago

This issue has been automatically marked as stale because it is marked as needing author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. Thank you for your interest in Office Add-ins!

microsoft-github-policy-service[bot] commented 5 months ago

This issue has been closed due to inactivity. Please comment if you still need assistance and we'll re-open the issue.