codex-team / editor.js

A block-style editor with clean JSON output
https://editorjs.io
Apache License 2.0
28.42k stars 2.07k forks source link

Paste from Microsoft Word #729

Closed sei-jdshimkoski closed 2 years ago

sei-jdshimkoski commented 5 years ago

A common use case for a WYSIWYG editor is to allow users to paste from Microsoft Word and match the styling as found in the document.

EditorJs does not have this functionality.

Is pasting from Microsoft Word on the roadmap?

gohabereg commented 5 years ago

Hi @sei-jdshimkoski

The problem is that Editor.js is not actually WYSIWYG. Surely some plugins might look like WYSIWYG components but the way of rendering is up to you and HTML output (or any other) might look completely different from how it looks in the editor.

I've tried to handle paste from MS Word, the data from Word comes in RTF format (text/rtf). It can be parsed, converted and passed to the plugins on the client but there is too much data, browser just can't process it and gets crushed.

If you are able to help with that and maybe make some investigation it would be really helpful!

sei-jdshimkoski commented 5 years ago

Thank you for the information. I guess this is one of the few tradeoffs that EditorJs needs to make at the moment.

I will try to do some investigation to see if I can figure out some sort of solution to this issue. If I figure anything out, I'll report back.

Thanks again.

rtpHarry commented 5 years ago

Perhaps a compromise would be a "work area" where you can paste in the document just to have it right there while you rebuild it into sections inside the editor.

I guess it would be a single rich text block but it wouldn't render clientside, its just for reference.

Ximore commented 5 years ago

Pasting from MS Word causes the text to be converted to an embedded image, but if I copy from word, paste into Pages on mac, and then copy from Pages and paste in editor, everything works great, and each paragraph and headline is created as individual blocks of content.

If I open the Word Document inside Pages directly (not just copying the content) Pages will inform me that some changes have been made: Pages informed about changes to Word file.

So if that is the case, it makes sense that editor.js processes Word content as an image, cause it may include bitmap data or similar due to the background in the document.

UPDATE 02-06-2019

I have tested from a word document on Windows, and the editor.js works fine with copying from Word on Windows. It doesn't work on Mac, but did some further testing.

I created a file in TextEdit, with some dummy text. Then I copied it and ran osascript -e 'the clipboard as record' | less in the terminal and got the following: Simple text in TextEdit. The content saved in the clipboard.

When pasting the text into an MS Word document and copying the text again to the clipboard, I saw some different results: Simple text in MS Word The MS Word content saved in the clipboard

When copying from MS Word, even just two words, I get a huge long list of data in the clipboard (Note that you can't even see the end flag in the bottom of the terminal.. The list is very long.). This is obviously nothing to do with Editor.js but more about the way Microsoft chooses to copy content in their apps on Mac. The funny thing is, that it works perfectly on Windows. No problem in copying the content from Word on Windows to Editor.js.

UPDATE 09-06-19 I tested it all again today, and suddenly it all worked.. That was until I realised that I was using the Safari browser. If using the Safari Browser, the MS Word copy paste works very well - with the exception of bold text not being interpreted correctly. But pasting in Google Chrome on mac causes it to be shown as an embedded image.

jakekara commented 3 years ago

I've made a few observations trying to get Copy + Paste from MS Word into Chrome on MacOS, and here's what I've found. My only goal has been to support bold, italic and anchor tags with href property. I do not want to copy in any other markup or images.

When I copy + pasted from MS Word into Chrome, nothing would happen. However it works properly in both FireFox and Safari without any intervention from me. So I did a little digging.

If I copy and paste from MS Word, the ClipboardEvent.clipboardData.types is ["text/plain", "text/html", "text/rtf", "Files"]

If I copy the same text into TextEdit then into Chrome, ClipboardEvent.clipboardData.types is["text/plain", "text/html", "text/rtf"]` (note there is no "Files" type).

I focused on parsing the "text/html" data. I don't know about MS Word's formatting, but I saw there are <!--StartFragment--> and <!--EndFragment--> surrounding the content of what I'm trying to paste, still with a fair bit of inline styling junk I don't care about. At first I regexed out everything outside of these tags, but I found that doesn't seem to be necessary. Instead, I found that using the API that is passed to the tool, I could call API.sanitizer.clean() against this string populate the result into the block's innerHTML.

bsodmike commented 3 years ago

Hi all,

Has there been any further progress on this issue?

Thanks!

gabrielmoterani commented 3 years ago

Hey guys,

Any update about this issue?

bsodmike commented 3 years ago

Hey guys,

Any update about this issue?

Probably not as helpful given the context but the team I'm with just switched everything over to TinyMCE v5 - it doesn't use the block approach but is surprisingly feature complete and handles content from Word amazingly well.

Teebo commented 3 years ago

@bsodmike I have seen Tiny pretty awesome tool, can I get a JSON Like output from it (Or that is what you meant by it not using the block approach)?

bsodmike commented 3 years ago

@bsodmike I have seen Tiny pretty awesome tool, can I get a JSON Like output from it (Or that is what you meant by it not using the block approach)?

As per the docs, JSON output should be possible.

quaidesbalises commented 3 years ago

@bsodmike what is it exactly ?

mhmttosun commented 3 years ago

Hi, When writing online it is a common behavior to copy some text and paste in editor. Clean word paste will encourage usage of this awesome editor. Some editors (CKEditor, TinyMCE, Froala, WordPress Gutenberg etc..) supports copy and paste from MS Word. People may want to make use of their word files content. I am not mean copying lots of word pages and pasting editor. There should be character limit on pasted data to prevent crash.