OfficeDev / office-js

A repo and NPM package for Office.js, corresponding to a copy of what gets published to the official "evergreen" Office.js CDN, at https://appsforoffice.microsoft.com/lib/1/hosted/office.js.
https://learn.microsoft.com/javascript/api/overview
Other
657 stars 93 forks source link

EMF, WMF and EMZ files are not returned by inlinePictures API on Windows #3145

Closed rogozind closed 1 year ago

rogozind commented 1 year ago

Attached document has two pictures embedded: EMZ and WFM formats. On Windows inlinePictures does not return these objects so we cannot extract images from such documents. Web and Mac handle these much better.

Your Environment

Expected behavior

EMF, WMF and EMZ images should be returned as any other image

Current behavior

Steps to reproduce

  1. Open attached file and call body,inlinePictures API


Link to live example(s)




Provide additional details




  1. emz test.docx

Context

Useful logs

Thank you for taking the time to report an issue. Our triage team will respond to you in less than 72 hours. Normally, response time is <10 hours Monday through Friday. We do not triage on weekends.

ghost commented 1 year ago

Thank you for letting us know about this issue. We will take a look shortly. Thanks.

penglongzhaochina commented 1 year ago

Hi @rogozind could you share you gist?

rogozind commented 1 year ago

So we do the following:

const loadInlineImages = () => new Promise((resolve, reject) => {
  Word.run(async(context) => {
    try {
      const images = context.document.body.inlinePictures
      images.load(['imageFormat, items'])
      await context.sync()
      resolve(images)
    } catch (error) {
      console.log('Error: ', error)
      reject(error)
    }
  })
})
const inlineImage = await loadInlineImagesData()

and in this case, for images with format .emz images.items are empty. For all other image formats like .png .jpg this works correctly and inside images.items we have items.

Hope it helps.

xuruiyao-msft commented 1 year ago

@rogozind Hi I tried this piece of code below and it works fine. The inline pictures length of the attached file is 2 not empty.

Which version of word are you using? I am using version 2301 (Build 16026.20146).

rogozind commented 1 year ago

I am using latest after an update: 2301 build 16026.20200 Same happened on an older version before the upgrade (2210 build 15726.20202) We will try to create a sandbox program.

rogozind commented 1 year ago

You are correct, this file does not reproduce this problem. I got confused.

The issue this file reproduces is the fact we do not get format for these files. This makes it impossible to detect EMZ and EMF files. We get this from the API:

image

I do have a file which reproduces the original problem but it is a customer file and I can't attach it here w/o the approval. Once I get the approval I will place it here.

Do you think you can fix the format being undefined for EMZ/EMF/WMF files?

xuruiyao-msft commented 1 year ago

@rogozind Would you please help provide the code about how to get the imageData like the picture you show in last comment? Thanks!

rogozind commented 1 year ago

here you go. It is an interpretation of the inlinePictures results:


  Word.run(async(context) => {
    try {
      const images = context.document.body.inlinePictures
      images.load('imageFormat')
      await context.sync()
      const imageData = images.items.map((image) => {
        return {
          format: image?.imageFormat,
          base64: image?.getBase64ImageSrc()
        }
      })
      await context.sync()
      resolve(imageData)
    } catch (error) {
      console.log('Error: ', error)
      reject(error)
    }
  })
})```
rogozind commented 1 year ago

ok, I figured out the original issue (kind of). The picture in question was not actual picture but an embedded excel document. On Mac, it seems to be converted to an image and returned with inlinePictures list. On Web it seems to be completely dropped from both returned HTML as well as inlinePictures. On Windows it is returned with HTML but not in inline pictures.

Here is an example of the file:
excel in word.docx

Sorry, it morphed into two unrelated problems:

  1. EMZ/EMF/WMF objects do not return image format in imageFormat string.
  2. Excel embedded objects are not returned on Web and Windows. Ideally I would expect Mac behaviour to happen on Web and Windows: auto-concert to an inlineImage.
rogozind commented 1 year ago

So here is another example. The customer had a document with 3 images. Only two have been retuned by the API. I trimmed everything customer specific from the document and cropped the image down to a grey box. Then I pasted my one small image. When I run the code from above (inlinePictures.items) only my pasted image is returned. But the grey box is not. image test.docx

These image issues make our addin unusable by our biggest client. Because there is no way to map the images returned by inlinePictures to the src references in the HTML we have to rely on the order. If the order of elements returned and IMG tags do not match not only we end up with missing pictures, we also get pcitures moved from their proper location to wrong place.

I am surprised other people did not complain about that but please fix it sooner rather than later.

rogozind commented 1 year ago

Any update on this one please?

xuruiyao-msft commented 1 year ago

@rogozind Hi it seems like there're two problems here.

  1. EMZ/EMF/WMF objects do not return image format in imageFormat string.
  2. Excel embedded objects are not returned on Web and Windows. Ideally I would expect Mac behaviour to happen on Web and Windows: auto-concert to an inlineImage.

About the second issue, I can reproduce it using the document you provided. I create the work item 7732245 and the incident 377334775 to track this issue. We've elevated the priority to handle this issue. I want to confirm how you insert the excel embedded object to word? I want to know the detailed steps how you did that?

About the first issue. Please refer to this link: https://learn.microsoft.com/en-us/javascript/api/word/word.inlinepicture?view=word-js-preview#word-word-inlinepicture-imageformat-member. The imageFormat property of InlinePicture is an API of Preview version. Could you please check the version of Office JS you referenced?

xuruiyao-msft commented 1 year ago

@rogozind About the first issue:

  1. EMZ/EMF/WMF objects do not return image format in imageFormat string.

I use the office-js-preview, and the imageFormat of inlinePicture is existed in this version of Office-JS. But it's also undefined. But from the document, the Note of the imageFormat of InlinePicture:

This API is provided as a preview for developers and may change based on feedback that we receive. Do not use this API in a production environment.

It seems like that the API is not stable enough. But I also create a work item(7732257) to track it.

rogozind commented 1 year ago

Yes, we actively using imageFormat and it returns nothing for those legacy formats.

As far as creating those document we use Paste Special on Windows. Here are some instructions you can try: https://trumpexcel.com/copy-excel-table-to-word/#Embed-Excel-Table-into-Word-as-a-Linked-Object

xuruiyao-msft commented 1 year ago

@rogozind Hi, I furtherly investigate into the first issue:

  1. EMZ/EMF/WMF objects do not return image format in imageFormat string.

I insert the EMZ/EMF/WMF picture into word document, and the corresponding imageFormat is as below: image The file for test is here:
EMZ&EMF&WMF.odt

From the returned imageFormat, EMF WMF are recognized as Pdf Pict but not 'undefined'. Would you please provide the picture of type EMF and WMF? It seems like I cannot reproduce the undefined error.

From the api description: Screenshot (1)

For EMZ, it's haven't been supported yet. We track Office Add-in feature requests on our Microsoft 365 Developer Platform Ideas Forum. Please add your request there. Feature requests on are considered when we go through our planning process. Thanks, Microsoft 365 Developer Platform team ​.

Thank you very much for your feedback.

xuruiyao-msft commented 1 year ago

@rogozind Thanks for your effort. But the InlinePicture.ImageFormat is just for preview environment, and haven't been released. It's not good idea to use it in production environment. If you request to release this api, Please add your request here. ( https://aka.ms/m365dev-suggestions). We will track it there.

xuruiyao-msft commented 1 year ago

@rogozind For the second issue:

  1. Excel embedded objects are not returned on Web and Windows. Ideally I would expect Mac behaviour to happen on Web and Windows: auto-concert to an inlineImage.

Actually, the embedded excel is a field. Would you please refer to this link and have a try on MAC, Windows and online? https://learn.microsoft.com/en-us/javascript/api/word/word.field?view=word-js-preview

const fields = context.document.body.fields.getFirst(); fields.load(); await context.sync(); console.log(fields);

For the https://github.com/OfficeDev/office-js/files/10814594/excel.in.word.docx file, the output of console is "code: " EMBED Excel.Sheet.8 "".

I hope field is exactly what you need. Thanks for your feedback.

xuruiyao-msft commented 1 year ago

@rogozind Hi, have you tried the field solution for your case? Can this solution fix your issue? If you have any progress or any blockers, please let me know. Do you want to book another meeting to talk about it?

rogozind commented 1 year ago

Sorry, dropped the ball on this one. This helps to recognize we have embedded elements in the system but does not help to produce valid HTML. Two reasons: there is no way to get an image out of the field object. Second: same old issue: no way to correlate that field with the \<img> tag inside HTML.

We are now working on a backend solution to take DOCX file and convert it to HTMl. That way we do not try to construct HTML inside the AddIn and instead we do it on the server. If this works it will solve all those troubles we have with the addin.

xuruiyao-msft commented 1 year ago

@rogozind Hi, does the backend solution resolve your troubles? Do you need help from our side?

rogozind commented 1 year ago

We are good for now. Thank you for following up!

rogozind commented 1 year ago

@xuruiyao-msft I spoke too soon. Our backend solution does not cover all cases either. I understand the issue of embedded objects (charts, excel tables etc.), not much we can do here. But the issue of "sometimes emz images are not returned" is still happening. Please see the image test.docx file in this comment: https://github.com/OfficeDev/office-js/issues/3145#issuecomment-1444231760 above.

Can we get to the bottom of it and see if we can improve the API to return that EMZ image?