Closed AliaksandrBortnik closed 1 month ago
Is there any elegant approach to remove unsupported element with its content (all children)? Such as above having <style><!-- comment --></style>
.
I have finished with this approach, however, I do feel there is a better part of API to handle data filtering.
this.editor.data.upcastDispatcher.on('element:style', e => e.stop(), { priority: 'highest' });
Otherwise, what is the reason of CKE5 behavior this way? Since comments had to be removed completely if HtmlComment of GHS is not included.
UPD: It seems I've found the cause of that behavior described above. https://github.com/ckeditor/ckeditor5/blob/master/packages/ckeditor5-engine/src/controller/datacontroller.ts#L151
Let's say, there is unsupported element such as <style>
. It will be ignored (no conversion), but its children will be converted by the default lowest
priority convertor of dataProcessor to a document fragment using convertToModelFragment
.
Then, children will be handled by upcast dispatcher. https://github.com/ckeditor/ckeditor5/blob/master/packages/ckeditor5-engine/src/conversion/upcastdispatcher.ts#L301
It is interesting edge case for comments which must be removed from the content by built-in functionality of CKE5 out of the box.
Interesting case indeed, @niegowski WDYT? In my opinion, all comments should be stripped, no matter if in unsafe elements.
I was able to reproduce this case without GHS (and without HtmlComment feature). This is happening only in <style>
and <script>
tags and is caused by the fact that those elements' content is defined in HTML as a CDATA and parsing its content is out of the scope of HTML parser. You could see the difference by triggering such code:
new DOMParser().parseFromString( '<body><style>foo <!-- comment --> bar</style></body>', 'text/html' ).body.childNodes[ 0 ].childNodes[ 0 ].data
vs the plain paragraph:
new DOMParser().parseFromString( '<body><p>foo <!-- comment --> bar</p></body>', 'text/html' ).body.childNodes[ 0 ].childNodes
Note that in the first snippet, it outputs a single text node with the whole content of the <style>
tag (including the HTML comment as a text). In the second snippet it outputs 3 DOM nodes.
Why is the content of the <style>
tag preserved in the editor? Because we try to preserve all text content even if we do not handle the given tag by enabled features. Maybe we should alter the default behavior to exclude <script>
and <style>
tags from this preservation mechanism.
@niegowski could you please advise how to prevent the whole
π Steps
Input:
<p>pre <!-- Comment --> post</p>
. Output:<p>pre post</p>
. βοΈ Perfect!Input:
<style>pre <!-- Comment --> post</style>
. Output:<p>pre <!-- Comment --> post</p>
. β Wrong.Input:
<style><p>pre <!-- Comment --> post</p></style>
. Output:<p><p>pre <!-- Comment --> post</p></p>
. This makes me thinking there is specific logic that all content of unsupported element will be preserved as plain text with no further model conversions. It is probably not related directly to comments, but in general.Style element is not supported, that's fine. Why the result is not
<p>pre post</p>
for the second case?βοΈ Expected result
No comment as plain text inside of CKE5 content.
β Actual result
The full comment exists as content.
π Other details
If you'd like to see this fixed sooner, add a π reaction to this post.