ckeditor / ckeditor4

The best enterprise-grade WYSIWYG editor. Fully customizable with countless features and plugins.
https://ckeditor.com/ckeditor-4
Other
5.8k stars 2.48k forks source link

Images not imported from word #2800

Closed smatteoda closed 4 years ago

smatteoda commented 5 years ago

Type of report

Bug

Provide detailed reproduction steps (if any)

  1. … Open selected word document, copy entire text (Ctrl A, Ctrl C).
  2. … Paste into CKEditor Note: extraPlugins: 'pastefromword,uploadimage,autogrow'

Expected result

What is the expected result of the above steps?

All images should be imported.

Actual result

Images are not imported and for each image I see in the browser dev tools: Not allowed to load local resource: file:///C:/Users/myusername/AppData/Local/Temp/OICE_8ADC376A-A3BC-4902-B017-FA55C2ECFE10.0/msohtmlclip1/01/clip_image028.png

This happens for all 63 images.

Other details

68554-Document.docx

Note: if only part of the document is selected, it may work, if all is selected, they are not imported...


Other issues which mention Paste From MS Word and images problems: #3972, #3937, #3782, #3781, #2675, #2516, #1345, #1134

jacekbogdanski commented 5 years ago

Hello,

I can reproduce the issue. Images are available via RTF, so it probably should be possible to fetch them.

smatteoda commented 5 years ago

Hi, thanks for the quick turn around... When you say images are available via RTF, do you mean there should be a workaround I should implement so my users can import them? If this is the case, can you guide me to it?

jacekbogdanski commented 5 years ago

I mean that the issue can be probably fixed by CKEditor. I'm sorry but I cannot provide any viable workaround right now. If we will be able to fix the issue it will be available by updating CKEditor to the newer version (the one containing bug fix). Currently, I can only recommend keeping eye on this ticket for more updates.

smatteoda commented 5 years ago

Got it, thanks...

pravinghadgeindia commented 5 years ago

Any update on this issue?

jswiderski commented 5 years ago

This happens for all 63 images. Note: if only part of the document is selected, it may work, if all is selected, they are not imported...

@jacekbogdanski is it possible that 63 images is simply too much for browser or CKEditor code to handle and there is some simply silent error being thrown? If it works for part of the document it should also work for the whole.

If you limit this document to first 5 pages and paste it as a whole, everything will work as expected so it really looks to me like some silent out of memory case.

jacekbogdanski commented 5 years ago

Pasting so many images will certainly have an impact on the performance and may result in some incorrect behavior or even freeze the browser. I would rather say it's not caused by some implementation error but as you wrote out of memory issue.

Bear in mind that some images may weight a couple of MB.

jswiderski commented 5 years ago

Here is some other document for which images don't get pasted at all when you paste text and image together: Document for CKEditor - Copy.zip Since in the original document pasting text with mage works it is possible we will have to create a separate issue for this file.

pravinghadgeindia commented 5 years ago

I have seen that when pasting some charts or images from excel to word and then pasting same images from word to ckeditor will not work some time.
Will it be problem

f1ames commented 5 years ago

This issue is caused by the fact that we have a mechanism which checks if number of images extracted from RTF is the same as <img> tags in the HTML content from the clipboard so it can match them correctly (if it's not the same no images are inserted):

https://github.com/ckeditor/ckeditor-dev/blob/b2c28268964a7581b03da2afbe8ed4483e7bd6fb/plugins/pastefromword/plugin.js#L152-L169

and so in the document for which this issue was reported there are 31 <img> tags:

0: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image001.png"
1: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image002.png"
2: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image003.png"
3: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image004.png"
4: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image005.png"
5: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image006.png"
6: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image007.png"
7: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image008.png"
8: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image009.png"
9: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image010.png"
10: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image011.png"
11: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image012.png"
12: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image013.png"
13: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image014.png"
14: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image015.png"
15: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image016.png"
16: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image017.png"
17: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image018.png"
18: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image019.png"
19: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image020.png"
20: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image021.png"
21: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image022.png"
22: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image023.png"
23: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image024.png"
24: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image025.emz"
25: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image026.emz"
26: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image027.png"
27: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image028.png"
28: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image029.png"
29: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image030.png"
30: "file:////Users/krzysztofkrzton/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image031.png"

but only 29 images extracted from RTF. There are 2 images less extracted from RTF, because 2 images are in .emz format (see entries 24 and 25 above) and are simply skipped by extraction mechanism:

https://github.com/ckeditor/ckeditor-dev/blob/b2c28268964a7581b03da2afbe8ed4483e7bd6fb/plugins/pastefromword/filter/default.js#L2205-L2211


There is also a different case when number of RTF images is greater than <img> tags. This is caused for example by using image in document header, like in the sample document here - imageInHeader.docx. The number of RTF images is 3, but there are only 2 <img> tags (since headers are not supported - basically not present in clipboard data when pasted from Word).

zoarif commented 5 years ago

We are experiencing similar issue and I am attaching the sample document where the issue is happening for us. paste_sample.docx

f1ames commented 5 years ago

Hello @zoarif, could you share your environment details (browser, os, Word version)? And some reproduction steps (e.g. it happens only when entire document is copied or some specific part, etc)?

zoarif commented 5 years ago

Ckeditor version: 4.11.4 Browser: Chrome MS Word: [A screenshot of a cell phone Description automatically generated]

leonardorame commented 4 years ago

Hi, here is a very simple RTF document created with WordPad, containing only one line of text and an image.

Documento.rtf.zip (Please note I had to create a zip file because the RTF extension is not allowed by GitHub).

The behavior is the same, after copying from WordPad, then ctrl + v, it only pastes the text, but not the image.

The original image is one of the sample images that came by default on Windows 7.

Browser: Firefox Quantum 70.0.1
OS: Windows 7
CKEditor version: 4.13
Installed CKEditor plugins: CKEditor Standard Package.
simonshen2016 commented 4 years ago

@jacekbogdanski We are very pleased that you have solved the problem of copying and pasting the word under linux(issue #3629), but when can you solve this problem of losing pictures when pasting a lot of picture text in word?

jacekbogdanski commented 4 years ago

Unfortunately, we didn't fix the issue on Linux, it has been closed due to duplication. We are concerned about the bug as it looks like it touches many users, so it has been added to 4.14.0 milestone, which is currently set on February 20, 2020. Please, note that ETA for release may change, but hopefully we will be able to provide a bug fix for the 4.14 release.

leonardorame commented 4 years ago

On Linux this works as expected, at least by copying/pasting from LibreOffice, the bug only exists on Windows.

simonshen2016 commented 4 years ago

Unfortunately, we didn't fix the issue on Linux, it has been closed due to duplication. We are concerned about the bug as it looks like it touches many users, so it has been added to 4.14.0 milestone, which is currently set on February 20, 2020. Please, note that ETA for release may change, but hopefully we will be able to provide a bug fix for the 4.14 release.

@jacekbogdanski Thank you very much for your prompt reply. We sincerely hope that the problem can be solved soon. This problem appears in the windows environment, and bothered us for a long time. The other details of the problem I have are as follows: Browser: Chrome 78.0.3904.97 (64 bits) OS: Windows 10 Professional Edition CKEditor version: 4.11.2 Installed CKEditor plugins: 'pastefromword,uploadimage,autotag'

msamsel commented 4 years ago

The case might sound trivial, however, it has hidden fragile logic. That's why it would require good testing and work with caution. As @f1ames notice:

This issue is caused by the fact that we have a mechanism which checks if number of images extracted from RTF is the same as tags in the HTML content from the clipboard so it can match them correctly (if it's not the same no images are inserted):

This is a quite simple solution, which is sufficient in most common cases. RTF format has unfriendly structure, and we don't have written any good parser for it. It is also really hard to find any good connection between text/html and text/rtf clipboard, as there are no images URLs in RTF or any other significant data which can clearly bound specific image in text/html with its content in text/rtf. The simplest way seems to be just filtering .emz files from processing images or creating some white list with accepted extensions. However, this sounds like a good candidate to desynchronize image processing for other files. Currently, we have similar logic for processing shapes draw inside PFW (as those are also embedded as images). Another possibility might be to start supporting emz files. If there is proper HEX-data representation of emz image in text/rtf clipboard, then should be possible to obtain it and convert to dataURI in a similar way as we do with JPG or PNG. Best and most time consuming would be writing the RTF parser, which together with HTML parser would build some sort of model of pasted data and properly replace src in images.

That's why I changed the workload to high as the case would require good planning and tests which prevent any regressions.

msamsel commented 4 years ago

Hi @leonardorame,

The behavior is the same, after copying from WordPad, then ctrl + v, it only pastes the text, but not the image.

PFW supports Microsoft Word application, not the WordPad. Your case is a new feature request for creating the plugin "Paste from WordPad".