Comment is not removed from content when it is inside unsupported element

AliaksandrBortnik commented 1 year ago

📝 Steps

Input: <p>pre  post</p>. Output: <p>pre post</p>. ✔️ Perfect!
Input: <style>pre  post</style>. Output: <p>pre  post</p>. ❌ Wrong.
Input: <style><p>pre  post</p></style>. Output: <p><p>pre  post</p></p>. This makes me thinking there is specific logic that all content of unsupported element will be preserved as plain text with no further model conversions. It is probably not related directly to comments, but in general.

Style element is not supported, that's fine. Why the result is not <p>pre post</p> for the second case?

✔️ Expected result

No comment as plain text inside of CKE5 content.

❌ Actual result

The full comment exists as content.

📃 Other details

HtmlComment plugin is not used that is a part of GHS.
Browser: Chrome - Version 115.0.5790.110 (Official Build) (64-bit)
CKEditor version: 38.1.0.

If you'd like to see this fixed sooner, add a 👍 reaction to this post.

AliaksandrBortnik commented 1 year ago

Is there any elegant approach to remove unsupported element with its content (all children)? Such as above having <style></style>.

I have finished with this approach, however, I do feel there is a better part of API to handle data filtering.

this.editor.data.upcastDispatcher.on('element:style', e => e.stop(), { priority: 'highest' });

Otherwise, what is the reason of CKE5 behavior this way? Since comments had to be removed completely if HtmlComment of GHS is not included.

UPD: It seems I've found the cause of that behavior described above. https://github.com/ckeditor/ckeditor5/blob/master/packages/ckeditor5-engine/src/controller/datacontroller.ts#L151

Let's say, there is unsupported element such as <style>. It will be ignored (no conversion), but its children will be converted by the default lowest priority convertor of dataProcessor to a document fragment using convertToModelFragment.

Then, children will be handled by upcast dispatcher. https://github.com/ckeditor/ckeditor5/blob/master/packages/ckeditor5-engine/src/conversion/upcastdispatcher.ts#L301

It is interesting edge case for comments which must be removed from the content by built-in functionality of CKE5 out of the box.

Witoso commented 1 year ago

Interesting case indeed, @niegowski WDYT? In my opinion, all comments should be stripped, no matter if in unsafe elements.

niegowski commented 1 year ago

I was able to reproduce this case without GHS (and without HtmlComment feature). This is happening only in <style> and <script> tags and is caused by the fact that those elements' content is defined in HTML as a CDATA and parsing its content is out of the scope of HTML parser. You could see the difference by triggering such code:

new DOMParser().parseFromString( '<body><style>foo <!-- comment --> bar</style></body>', 'text/html' ).body.childNodes[ 0 ].childNodes[ 0 ].data

vs the plain paragraph:

new DOMParser().parseFromString( '<body><p>foo <!-- comment --> bar</p></body>', 'text/html' ).body.childNodes[ 0 ].childNodes

Note that in the first snippet, it outputs a single text node with the whole content of the <style> tag (including the HTML comment as a text). In the second snippet it outputs 3 DOM nodes.

Why is the content of the <style> tag preserved in the editor? Because we try to preserve all text content even if we do not handle the given tag by enabled features. Maybe we should alter the default behavior to exclude <script> and <style> tags from this preservation mechanism.

AliaksandrBortnik commented 1 year ago

@niegowski could you please advise how to prevent the whole

ckeditor / ckeditor5