ckeditor / ckeditor5

Powerful rich text editor framework with a modular architecture, modern integrations, and features like collaborative editing.
https://ckeditor.com/ckeditor-5
Other
9.36k stars 3.68k forks source link

Schema improvements #3784

Closed scofalik closed 4 years ago

scofalik commented 8 years ago

As we know, Schema is probably the weakest part of engine model right now. We didn't have clear requirements and idea how it will be used when we created it.

Since iteration 2 is about bringing stable API, I think we need to include refactoring/rewritting/improving Schema. I open this issue so we can share requirements and thoughts on how it could work.

Some conclusions that we aready drawn:


EDIT (by @Reinmar): I've read through the entire topic and created an excerpt from it which I'm adding below.

The following list of requirements, ideas and doubts was compiled from the discussion that we've lead in this ticket. It's mostly focused on stuff that works poorly now or which is not supported at all. The things which the current schema supports can be found in the code so the correctness of the new approach will be naturally validated when refactoring the code.

Attributes

Elements

Other topics

Reinmar commented 8 years ago

TBH, I think that we're not going to be able to work on schema in iteration 2. We don't know enough yet and we're not going to have that much time.

But it should be fine to work on it later, because it doesn't affect the code thaaat much. Ofc, each feature and many tests define the schema but it's not critical for feature development, unless we face some issues. But this will be the exact moment when we should start thinking on the schema.

PS. So far the schema isn't even used by the existing features for things other than registering elements.

scofalik commented 8 years ago

Multiple features/parts of code uses Schema directly or indirectly:

link and image features will be probably first features that we will work on after iteration 2 and 3 so we will need working Schema pretty soon. Since iteration 2 is about bringing stable API for smooth feature developement, I thought it is a good moment to start working on it. I leave decision up to you, but we will have to either do it in iteration 2 or at the beginning of iteration 3. I'd see how iteration 2 will go and decide at the end of it whether it's ok to include Schema refactor in it.

BTW. Other features also probably should use Schema, i.e. to, at least, disable itself if selection is in "incorrect" position, etc. The thing is that we forgot about it or didn't care at the moment. So we could work more globally on schema in upcoming iterations - first we should improve Schema itself and then look at the features and decide what is missing. The problem is that the later we will do it, the more tech debt we will have.

Reinmar commented 8 years ago

It's not said that we're not going to refactor any code after iteration 2. Until 1.0 we're going to rewrite some pieces many times still :). Schema can be one of them.

I agree that it's an API used in many places. That makes this a harder decision. But I'd postpone any bigger refactoring of it until we have more use cases in hands. We don't know much more now about what features will need to define than we knew few months ago when we implemented it initially. So there's a high chance that we'd have to repeat refactoring in a couple of months.

Let's maintain the concept that we've got right now for couple next iterations and let's get back to it with better image of what we need. This means that if link or image features will miss some features in the current implementation we shouldn't immediately try to refactor the whole code.

pjasiun commented 8 years ago

I agree with @Reinmar. We will not freeze the API after iteration 2. Also, I do not think we have a clear understanding what schema features we need yet. We can introduce them as soon as we will need them.

Reinmar commented 8 years ago

Another case to keep in mind: https://github.com/ckeditor/ckeditor5-headings/issues/27#issuecomment-245559218.

Reinmar commented 7 years ago

Two new things to remember:

  1. I needed a new type of items – objects. Some elements, like e.g. <image> or other widgets, are atomic objects which should be handled differently than other blocks.
  2. In order to use check( { name: ... } ) with a node I need to convert it to a name and this requires additional logic if the node can be a text.
Reinmar commented 7 years ago

Another thing – SchemaPath is confusing. You can pass there an element, but it must be wrapped in an array. But the same isn't true for a position... So, when checking if foo is allowed inside some element I must wrap it myself.

Reinmar commented 7 years ago

And I found one more thing. Perhaps it's my mistake but if I don't get it then there's something wrong with the API:

this.schema.check( { name: 'paragraph', inside: [ 'paragraph' ] } ); // -> false
this.schema.check( { name: 'paragraph', inside: position } ); // -> true
position.parent.name; // -> 'paragraph'
Reinmar commented 7 years ago

Another thing. I was wondering why list items are filtered out. It turned out that checking whether a specific element can be inside some context is tricky because this isn't enough:

schema.check( { name: el.name, inside: ctx } );

You also need to pass this element's attributes to the check() function.

scofalik commented 7 years ago

Poor Schema. It must feel really bad and low reading all your negative feedback.

Reinmar commented 7 years ago

Especially, taken that it knows that it's here only temporary :P

One more thing:

const attributes = ( node instanceof Element ) ? node.getAttributes() : undefined;

        return this.schema.check( { name: this._getNodeSchemaName( node ), attributes, inside: path } );

And I've got e.g. 10 tests failing.

const attributes = ( node instanceof Element ) ? node.getAttributes() : null;

        return this.schema.check( { name: this._getNodeSchemaName( node ), attributes, inside: path } );

And I've got 20 tests failing.

Very curious :D

Reinmar commented 7 years ago

I'm now working on https://github.com/ckeditor/ckeditor5-engine/issues/732 because it turned to be a necessary improvement for autoparagraphing feature. After strengthing the schema check algorithm I need to fix now dozens of tests which (usually mistakenly) relied on weak schema checks.

But this made me think that many of these tests are not interested in testing the schema anyway. Many should, like the entire delete content, insert content, data loading, parsing, stringifying, etc. What about modify selection for instance? Should it configure schema or can we introduce "allow all" for it? First I thought that it doesn't matter in this case. But then, how reliable the tests will be if we mock the schema ? Then the text will be allowed in every context, making even such algorithms like this vulnerable (modify selection may need to check whether where it wants to move the caret is a correct text position). So, after giving it a second thought I came to a conclusion that we should not use "allow all" in our tests.

We'll need, however, to introduce some more useful syntax for configuring schema, because now you need to call registerItem() and allow() for every element you introduce and often more than once.

pjasiun commented 7 years ago

So, after giving it a second thought I came to a conclusion that we should not use "allow all" in our tests.

Agree. We could have really unit tests wich check only a single feature, but I think that more real scenarios are more useful (as long tests are not too complex and not too hard for the debugging).

At the same time note that we will need something like "allow all", for instance, the collaboration server may need such feature.

scofalik commented 7 years ago

I copy paste from https://github.com/ckeditor/ckeditor5-engine/pull/733/files#diff-f6cdfa67127a7d5095832bc5a9b5dec0R572

Okay it took me a while to analyze this. At first sight it seemed wrong to me that any path can be shorter. For checkPath = C D E both allowedPath = A B C D E and allowedPath = D E are right. This rises a question whether allowedPath should be matched in whole? In first allowedPath it seems that somebody explicitly want to set that E is allowed in A B C D, that is that one of ancestors is A.

OTOH, this will not be a problem if in checkPath we will always pass a "full path", that is a path that has $root as left-most element. This would "solve" the "issue" described above. Consider again: checkPath = $root C D E, allowedPath = A B C D E and allowedPath = $root A B C D E.

Now, check will fail for both allowedPaths (which allows us to use shorter and simpler form without $root).

And now back to the original problem. If we suppose that checkPath always has $root as first element, a check should fail for every checkPath shorter than allowedPath. That is because there will be a moment when we will check $root from checkPath against non-$root from allowedPath.

So there are a few solutions:

  1. We leave code that is here as it is and stress in documentation that checkPath has to be full. Drawback: it's prone to generate bugs. OTOH, I expect that in most cases, inside will be specified as model.Position.
  2. Automatically add $root as first element to checkPath if first element is not $root. Drawback: this may backfire if, for a reason, really want to check a path that starts from something else than $root.
  3. Return false if checkPath is shorter than allowedPath. This feels natural. It is impossible to match every item from registered allowedPath which require, that the last item in allowedPath is exactly in those parents. Drawbacks: IDK?
Reinmar commented 7 years ago

I've been thinking about the message quote feature today and couple of related things – like retrieving selected blocks.

This made me think about the schema. In order to identify block-like elements (paragraphs, list items, headings, etc.) we need to mark them in the schema. Currently, they all inherit from the $block type, but this isn't very convenient. If you'd like to introduce a new element which can't contain the same elements as $block or can be allowed in different elements.

So, we should separate the structure from element types. We can have default elements like $block, but they need to be marked to be of a block type.

I've been also thinking what kind of elements we need. I can list at least these (but I would keep this list open):

An element can be of many types at the same time. So, e.g., you'll have block limits (as in CKE4's blockLimit). You may have inline nested editables (which will be inline+limit), etc.

Finally, I need the "get selected block elements" function and I wonder where to put it. It needs to accept a selection and know about the schema. It coooould be the schema's method perhaps or selection's or one of the controller's, but maybe you have better ideas?

Reinmar commented 7 years ago

Special question to @scofalik – where's list feature's "get selected blocks" logic :D?

And the same question to @szymonkups – where's the same piece of logic in the heading feature :D

scofalik commented 7 years ago

https://github.com/ckeditor/ckeditor5-list/blob/master/src/utils.js#L34

It checks schema, whether given element is a $block. It's not nice and wasn't meant as final solution, ofc.

scofalik commented 7 years ago

I agree with what you've written down. Structure "flags" will be also kept/connected with schema? I mean they will be specified at the same time and place as registering element name to schema?

Reinmar commented 7 years ago

I agree with what you've written down. Structure "flags" will be also kept/connected with schema? I mean they will be specified at the same time and place as registering element name to schema?

Yes.

scofalik commented 7 years ago

Okay, I had a doubt that we will describe one entity in three places (creating ModelElement, creating schema item and describing structure in other place).

pjasiun commented 7 years ago

Schema attributes make sense for me (it's like meta elements attribute, what fit well to the tree structure of schema).

My only doubt is the inline attribute. I think we should introduce only these attributes, we are sure, we need now. But, as you said, the list should be open, so we will be able to introduce more in the future.

Reinmar commented 7 years ago

And the same question to @szymonkups – where's the same piece of logic in the heading feature :D

It will be this: https://github.com/ckeditor/ckeditor5-heading/blob/f0c19177afc4679349f0713a21bd9b8ba5c49378/src/headingcommand.js#L94-L95

And I created an issue to extract this logic: https://github.com/ckeditor/ckeditor5-engine/issues/811. For now, I'll base it on the $block type checks.

pjasiun commented 7 years ago

Special question to @scofalik – where's list feature's "get selected blocks" logic :D?

In should go to the Selection class.

szymonkups commented 7 years ago

We could think also if it would be possible to store context for block, inline, limit and object elements. For example: we can define caption element that is used only inside image element. So theoretically we would want to define caption as limit element only when inside image.

Reinmar commented 7 years ago

I'm thinking now how the message quote feature should get elements which can be wrapped with a <messageQuote> element. I implemented recently the Selection.getSelectedBlocks() method... to realise that the quoting mechanism differs from heading and lists.

Let's consider this content:

<paragraph>x[x</paragraph>
<image></image>
<paragraph>y]y</paragraph>

The heading and list commands will apply themselves to the two paragraphs but will ignore the image element.

The message quote feature should, however, wrap the image too. This made me think that for the quote feature all the 3 elements are the same kind of content – wrappable blocks.

We have a couple of options how to implement those two scenarios, but to keep this message short I'll only mention the one I like the most.

In the schema, I'd mark the <image> as a block too so it gets returned by getSelectedBlocks(). From content editing perspective these are all editable blocks of content. However, since it's also marked as an object, which means that it's a self-contained entity, it should be filtered out by the list and heading features.

I'm going to continue implementing the quote feature with the above assumption. The image cannot be marked as a block now, so it won't be quotable for the time being.

Reinmar commented 7 years ago

The check() method's interface doesn't allow us to check the rules easily and in a realistic way. E.g., I just wrote this:

        const isMQAllowed = schema.check( {
            name: 'messageQuote',
            inside: Position.createBefore( firstBlock )
        } );
        const isBlockAllowed = schema.check( {
            name: firstBlock.name,
            inside: 'messageQuote'
        } );

It's part of a logic checking if a block can be wrapped with a message quote. What's wrong here? That in the second check I'm not checking whether the real block I have can be wrapped with a message quote in the exact content in which it is now. I'm checking only its name and context-less message quote.

Schema was meant to be context-aware and this information must be provided and used by schema checks.

EDIT: Actually, the second check must also include attributes:

        const isBlockAllowed = schema.check( {
            name: firstBlock.name,
            attributes: Array.from( firstBlock.getAttributeKeys() ),
            inside: 'messageQuote'
        } );
Reinmar commented 7 years ago

In the schema, I'd mark the as a block too so it gets returned by getSelectedBlocks(). From content editing perspective these are all editable blocks of content.

Yep, I've just confirmed this in a different way. Currently, the image doesn't extend $block so it's not allowed in a quote at all (because quote accepts only $blocks inside itself).

PS. To make it clear why we can't quickly make image inherit from $block:

Reinmar commented 7 years ago

Great news :) Another issue with the Schema: https://github.com/ckeditor/ckeditor5-list/issues/34

It's not possible to allow element attributes on a registered element (which inherits from another element).

I'm going to workaround this in the messagequote feature by allowing the list item inside it explicitely, but this is an ugly hack which we must removed ASAP.

scofalik commented 7 years ago

https://github.com/ckeditor/ckeditor5-engine/issues/878

scofalik commented 7 years ago

Schema is not flexible enough. ATM, there is a bug in our features:

// Allow link attribute on all inline nodes.
editor.document.schema.allow( { name: '$inline', attributes: 'linkHref' } );

the author just wanted to inform schema that $inline elements can have linkHref attribute, but by not providing inside entry, accidentally $inline elements with linkHref are allowed everywhere.

This means that correct solution is: editor.document.schema.allow( { name: '$inline', attributes: 'linkHref', inside: '$block' } );

But the problem is that this doesn't go well with other features that are expanding $inline element. If other feature says that it allows $inline inside myCustomElement: editor.document.schema.allow( { name: '$inline', attributes: 'linkHref', inside: 'myCustomElement' } );

But then the $inline element won't be allowed in myCustomElement with linkHref. It means that features need to know about each other which is against our principles.

pjasiun commented 7 years ago

Schema is not flexible enough. ATM, there is a bug in our features:

// Allow link attribute on all inline nodes.
editor.document.schema.allow( { name: '$inline', attributes: 'linkHref' } );

the author just wanted to inform schema that $inline elements can have linkHref attribute, but by not providing inside entry, accidentally $inline elements with linkHref are allowed everywhere.

Schema is not clear enough.

Reinmar commented 7 years ago

This will be fun. We gathered a huge list of things to consider. Most of them didn't even cross our minds before. Designing the new schema will be really challenging :)

Reinmar commented 7 years ago

Things got seriously bad ;<

First, we had a problem that we had attrs defined like:

editor.document.schema.allow( { name: '$inline', attributes: 'linkHref' } );

This is incorrect because this allows $inline[linkHref] everywhere. So we needed to change them to:

editor.document.schema.allow( { name: '$inline', attributes: 'linkHref', inside: '$block' } );

Then, it turned out that this disallows links/bold/italic in image caption becuase caption doesn't inherit from $block... So we did this: https://github.com/ckeditor/ckeditor5-image/pull/95/files.

Which... is totally wrong ;| It means that caption can be used directly in $root and blockQuote (which lead to https://github.com/ckeditor/ckeditor5-block-quote/issues/10).

So... we knew that 💩 was deep. Now it also got serious.

I don't see a simple way out of this ATM. I can path block quote so it doesn't apply to captions because blockQuote is not allowed in image. However, this is stepping on a thin ice.

Reinmar commented 7 years ago

I've been thinking a bit about the use case where you want to allow image where blocks are allowed but you don't want to allow inside things which are allowed in blocks.

Currently, if you inherit from $block you get both – image will be allowed in $root (which is ok), but $inline will be allowed in image (which is not ok).

So, perhaps the way to go will be to split these aspects so you can define that image is allowed where $block is (it inherits this behaviour from $block) but you can define what's allowed in image separately.

This will also allow defining that caption inherits "allowed content" from $block but "allowed in" is set to image only.


Going a step further, we've been also thinking about basing schema API on an object format. From the top of my head:

schema.register( 'image', {
    allowedIn: schema.get( '$block' ).allowedIn,
    allowedContent() {
        return [ 'caption' ];
    }
} );

I'm worried, though, how will this API allow runtime extensibility. This is, developers must be able to either redefine the default schema or make slight adjustments to it.

E.g. someone may want to allow paragraphs and other blocks allowed in caption. Or to disable a certain attribute in captions. Theoretically, object format's advantage here will be that you operate on a fully transparent code, so you could do:

schema.get( 'caption' ).allowedContent = schema.get( '$root' ).allowedContent;

However, you could not do "disallow links in captions" because links are allowed on $inline and so far this API doesn't allow defining "disallow link attribute on $inline when in caption". I'm not sure how it could be done without significantly complicating this concept. One idea which came to my mind some time ago was that when requesting schema, it could pass the query through the entire chain of elements.

So, when requesting whether link attrs is allowed on $inline which is in caption which is in image and so on, this would be a result of calling:

schema.get( 'image' ).allowedAttributesInContext(
    [ 'caption', '$inline' ],
    schema.get( 'caption' ).allowedAttributesInContext( 
        [ '$inline' ],
        schema.get( '$inline' ).allowedAttributes()
    )
);

Alternatively, context could be passed to schama.get( '$inline' ).allowedAttributes(). This might be simpler but it would mean that it wouldn't be caption item which controls what's allowed inside it because the extension would be done on a level it knows nothing about ($inline item).

Reinmar commented 7 years ago

To make it clear – when checking if foo is allowed in bar we'd need to check that both conditions are true:

I wonder if this won't be too demanding in terms of how many things you need to take care of to define a simple thing. If so, we could introduce small helpers which'd simplify the job. E.g. Schema.allow( 'a' ).inside( 'b' ) which would take care of extending a's allowedIn and b's allowedContent.

Reinmar commented 7 years ago

Another thought – perhaps we can go the simple way for the declarative part of the schema. So, you'd be only able to define that foo can be in bar but not that foo can be in bar > bom. All kind of chains longer than 2 would need to be handled on the imperative level (by overridable methods, or methods with callbacks).

So, a schema item would have a simple, declarative allowedIn and allowedContent and allowedAttributes (arrays) and getAllowedIn(), getAllowedContent() and getAllowedAttributes(). Default implementations of these methods would just base on the properties, but you could override the method to implement some special rule.

We only need to figure out whether the most common scenarios can be done using the declarative API.

scofalik commented 7 years ago

Which... is totally wrong ;| It means that caption can be used directly in $root and blockQuote (which lead to ckeditor/ckeditor5-block-quote#10).

For now, blockQuote is the only non-root element that allows $blocks inside it right? Maybe for now it would be enough if we would just disallow caption in blockQuote? I know this stinks because it's coupling two features... Or... I don't know :| You are right that we are in 💩

I've been thinking a bit about the use case where you want to allow image where blocks are allowed but you don't want to allow inside things which are allowed in blocks.

Finally I understand your idea :P. This may be a way to go but makes things complicated. Slowly we have complexity creep where you have to define more and more stuff. A "schema defining object" would have four "properties":

To make it clear – when checking if foo is allowed in bar we'd need to check that both conditions are true:

  • foo can be in bar
  • bar allows foo inside

This seems overcomplicated and I don't really understand why?

Anyway, no matter what will be the final implementation of "schema building blocks", there will have to be some kind of controller that will use all "building blocks". This is why I think that this idea might be good too:

Another thought – perhaps we can go the simple way for the declarative part of the schema. So, you'd be only able to define that foo can be in bar but not that foo can be in bar > bom.

But are we gaining anything in terms of capabilities or is it just a gain in implementation simplicity?

I am sorry that my answer is not really insightful but this is a hard topic and it needs more focus than 15 minutes for response on GitHub. We really have to nail all the use cases we have, all the problems we encountered and how they would be resolved in solution A / B / C. This means that it will be really hard to come up with anything by discussing it on GitHub. Somebody have to gather all the data and start with the schema from scratch.

pjasiun commented 7 years ago

One more case: allow specific marker in some content/disallow it. For instance, I want to disallow creating comments in empty paragraphs. I may add an attribute with the same name as the marker to the schema and then check if such fake attribute is allowed. It might be the way to go, but it sounds a little like a hack. In general markers & schema is a topic we did not touch yet.

Reinmar commented 7 years ago

And one more case: https://github.com/ckeditor/ckeditor5/issues/477.

scofalik commented 7 years ago

Schema stuff is complicated and we need to think about what tools we want to give to users and how they will use it.

A common problem is that after some changes in model, nodes that previously had attributes now are incorrect. For example, let's say that a heading cannot have a bold text. If you have a heading and then a paragraph with bold text, merging the paragraph into heading (using Backspace for example) should end up in removing the bold.

That was an easy case, though. But we enabled in Schema so called "attribute groups", that is, the attribute is allowed on node only if the other is also set. After a change in model, if we will check such attributes one by one we will remove them even though the node is still in correct place. We would have to check every combination of attributes.

Then, we not only have to check attributes but also child-parent chains.

We not only have merging / moving but also RenameOperation.

So, in fact, if something changes in model tree, there is awful lot of things to change. Honestly, I'd remove the possibility to define "attribute groups" because they are seldom used, can be controlled (and fixed) directly by feature and add unnecessary complication to schema-related algorithms.

scofalik commented 7 years ago

Remember about attributes stored by DocumentSelection: https://github.com/ckeditor/ckeditor5-engine/issues/1127

scofalik commented 7 years ago

Also, recently we've been discussing whether schema should support required attributes set, that is whether there should be possibility to tell the schema, that given node is "okay" only when it has all of given attributes (for example, image is fine if it has alt and src).

This may make some of algorithms too complicated and we agreed that if that will be the case, we will skip this functionality.

jodator commented 6 years ago

I had similar issue when I defined alignment attribute on blocks as:

schema.allow( { name: '$block', attribute: 'alignment', inside: '$root' } );

I did this way probably due to misunderstanding of how schema works (to be honest I copied that from other plugin).

Anyway such definition prevented to add blockQuoute on blocks that had alignment attribute already set. The other way around - setting alignment attribute on blocks inside blockQuote worked well (or not well depending how this should behave).

The configuration that worked for me was:

schema.allow( { name: '$block', attribute: 'alignment' } );
jodator commented 6 years ago

Another case - still I'm not 100% sure if valid or not. Given previous attribute allow I'd like to check if adding atrribute to some block is allowed:

schema.check( {
    name: 'listItem',
    attributes: [ 'alignment' ]
} );

will fail, wheras:

schema.check( {
    name: 'listItem',
    attributes: [ ...listItem.getAttributeKeys(), 'alignment' ]
} );

will not. It was counterintuitive for me to add current attributes to check but maybe it makes sense to pass them so schema can check all attributes (ie when some combination of attributes are not allowed). Probably a some kind of helper method for checking attribute on existing item might be handy.

Reinmar commented 6 years ago

I've read through the entire discussion and all related tickets and I edited the initial post with a compiled list of requirements, ideas and doubts. I'll be now working on figuring out how to deal with them. One of the most important questions I see there is what we can actually do and what needs to be rejected as too complicated.

Reinmar commented 6 years ago

All check() use cases from the code (yep, it's sad but GH is buggy and can't display them):

https://github.com/ckeditor/ckeditor5-block-quote/blob/ce7e584059d3b01e1e56a91adbeb260fca2fd2f2/src/blockquotecommand.js#L224-L237

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/controller/deletecontent.js#L156-L161

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/controller/insertcontent.js#L224-L239

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/controller/insertcontent.js#L431-L437

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/controller/modifyselection.js#L89-L92

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/conversion/buildviewconverter.js#L298-L300

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/conversion/view-to-model-converters.js#L48-L61

https://github.com/ckeditor/ckeditor5-engine/blob/2924d529657241d926804aff9306f8a01e4ef0af/src/model/document.js#L306-L336

https://github.com/ckeditor/ckeditor5-heading/blob/614987b8120d02a2035fdf37c8c5c1fdf1f4a54c/src/headingcommand.js#L93-L98

https://github.com/ckeditor/ckeditor5-image/blob/949c6736b57c4ef1229ad08bed4ae977b17f924e/src/image/converters.js#L35-L38

https://github.com/ckeditor/ckeditor5-image/blob/949c6736b57c4ef1229ad08bed4ae977b17f924e/src/image/converters.js#L214-L223

https://github.com/ckeditor/ckeditor5-image/blob/949c6736b57c4ef1229ad08bed4ae977b17f924e/src/imagestyle/converters.js#L82-L87

https://github.com/ckeditor/ckeditor5-image/blob/949c6736b57c4ef1229ad08bed4ae977b17f924e/tests/image/converters.js#L209-L219

https://github.com/ckeditor/ckeditor5-list/blob/e5b50a323dc446bd787b4dd245f872d0bc1f6018/src/listcommand.js#L313-L319

https://github.com/ckeditor/ckeditor5-paragraph/blob/55bd4bfba9bf28b624389b648cf23fdbd1a4d3b0/src/paragraph.js#L242-L262

https://github.com/ckeditor/ckeditor5-paragraph/blob/55bd4bfba9bf28b624389b648cf23fdbd1a4d3b0/src/paragraph.js#L282-L302

https://github.com/ckeditor/ckeditor5-typing/blob/cd1bae074b9174d0ead429ec91b6449e18ba9c43/src/deletecommand.js#L150-L152

Reinmar commented 6 years ago

Another bug due to the schema: https://github.com/ckeditor/ckeditor5-alignment/issues/12. It will be fixed as well.

Reinmar commented 6 years ago

For the future reference – these are notes I took when working on the new schema API. Let's hope that we'll never need to read through all this again :)


While analysing all use cases I recognised three main types of information which we try to retrieve from the schema:

Operations

The first group of checks can be derrived from the types of deltas. We also have some additional helpers (like removeDisallowedAttributes()) and may add more.

Methods implementing these checks should be able to return true/false and we need to be able to make decisions based on the return values. This means that when you get false you should know that only one type of issue might've occured. E.g. that this node cannot be a child of its parent and you can't do anything about it. Situations that this child can't be a child of its parent, buuuuut... if you'd remove some attributes then maaaaybe must not occur. This means that checks will need to be limitted to a subset of information which we can retrieve from the tree.

Can be inserted

I think that these checks can't care about the exact insertion positions because often we don't know where exactly the node will end up and what siblings it will have. We also don't know this node's children. E.g. during conversion, we need to make decisions about children nodes of some element before they are inserted into the tree and before their chilren are known. The information we can provide are: ancestors list (with their attributes), the node to insert (with its attributes, but without its children).

However, it doesn't also seem to be feasible and reasonable to include node's attributes in this check. Why? Because the checkInsert() method will simply tell you that a node can't be inserted but not why. So if the decision would depend on an attribute you wouldn't know what to do next. Should I remove some attributes? Should I change their values? Is some attribute missing? Insertion check should only consider ancestors (with their attributes) and name of the element (or text).

So how would attribute validation look like? How to disallow bold in headings?

I think that the process should be two steps:

  1. Check whether a node can be inserted at all and insert it if so.
  2. Strip its disallowed attributes. Note: we're not going to have required attributes, so you only need to worry about disallowed ones.

Note, that if you'd disallow bold in headings now, the current implementation of conversion would work fine because we do exactly what I described above – we first insert $text (if it's allowed) and apply allowed attributes in the next step. But, insertContent() would not handle this gracefully if we hadn't had https://github.com/ckeditor/ckeditor5-engine/blob/fc7da8035cdc9533f66ad5de66711e25cd2b385e/src/controller/insertcontent.js#L229-L234. And this piece of code proves even better that structure checks should not care about node-to-insert's attributes.

Example API:

const el = doc.createElement( 'paragraph', { alignment: 'right' } );

schema.checkInsert( position, element );
schema.checkInsert( position, 'paragraph' );
schema.checkInsert( position, '$text' );
schema.checkInsert( position.parent, element );

// Sometimes you may need to check a virtual tree.
const ancestors = [
  { name: '$root' },
  { name: 'blockQuote', attributes: { some: 'one' } }
];
schema.checkInsert( position.getAncestors(), element );

I'm unsure still whether it should be checkInsert() because it indicates that it's only about insertion and sometimes you may want to check whether an already inserted node is correct. Therefore we can consider checkChild() or other ideas.

Can be wrapped with

Here we need two "can be inserted" checks and additional cleanup:

Both checks need to be be pefromed before any changes are done to the model, so the second one proves that the "can be inserted" check must be able to work with virtual trees. We could mock how the tree would look like after changes (similarly to what the differ would do) and that would be powerful, but it's an overkill. If we'll limit the "check can be inserted" checks to an array of ancestors (and set a rule that schema's callbacks can't check their siblings) and the element to be inserted, then we don't need such magic.

The problem here is that we also need to take care of attributes. If <paragraph align> got wrapped with a <blockQuote> and schema doesn't allow the align attribute on a <paragraph> when it's inside a <blockQuote> then this attribute should be automatically removed. By whom? A feature or the writer? Post-fixers?

Another solution would be to limit attribute checks so they are not contextual (can't check ancestors chain). But that would make it impossible to prohibit bold in a heading, so it'd be a serious limitation.

Can be renamed

Here, we need to do a couple of things:

The check is simple but it again proves that the "can be inserted" check needs to work with virtual elements.

The attributes cleanup again opens a question who should do it. Should it be automatical or not?

Can have an attribute

A simple check, but opens a couple of related questions:

should we support groups of attributes?

No, because that would be against the rule that a check must concern a single trait and that we need to be able to make decisions based on those checks. Let's consider a case when on element <image> there are two groups of attributes allowed: x and y, x and z. How would you clean disallowed attributes of <image a b c x y z>? It's doable but it's not as simple as checking one attribute after another. You need to check permutations of those in some ways. And this complexity will leak from the schema to features. Besides, we haven't yet had a clear case for attribute groups which couldn't be solved differently (using two or more different elements, representing those separate use cases).

Image requires src and alt, list item requires indent and type. With src and alt the problem is that if those values weren't provided on V->M conversion, you can't guess them later on. There are no sane defaults there. So, it's basically a converters' job to not convert images when they don't have those attributes or provide proper placeholder (for alt at least).

With list it seems to be a bit different. I thought that we can define that indent defaults to 0 and type to bulleted. And if we'd implement automatic attributes validation we can also support attributes default easily. This would ensure that those attributes are always present which is valuable when implementing features.

However, 0 isn't indent's default values in some cases. E.g. when you insert <listItem> in a middle of a deeply nested list. Then, indent must be aligned to that list. Hence, it's again job for a post-fixer.

So, to sum this up – it doesn't make sense to have required attributes because in most cases those checks are very contextual.

should we support attribute value validation?

This seems feasible. The value of an attribute (name) is its inherent pair, so both things can be checked together if we can always provide the value of the attribute in all checks that we need to do. It seems so, because I can't imagine a situation when you want to check whether element may have an attribute but you don't know its value yet. The only situation where the value is not yet known is when user hasn't entered it yet. But then, the feature's UI need to validate that value anyway because we won't go so far to enable configuring validation messages in the schema.

should we support attribute exclusion?

The use case is e.g. linkHref and linkName. The question is whether this can't be handled by converters? It's tricky, because Link and Anchor features would be separate plugins, but perhaps one of them could consume the entire <a> once handling it. So the first one would win if someone pasted <a href name>.

So, can we imagine cases where converters could not handle it? Or, is there any risk of attributes getting applied to one piece of text or one element by features, directly on the model?

A separate question is whether we can handle exclusion at all. I think so – the "can have attribute" check will just return false if the other attribute is already present. This will also work for the "remove disallowed attributes" when checking them one after another – simply, the first one will be removed and the second one kept. It means that this behaviour may seem to be non-deterministic.

Finally, the rules for which attribute has priority over which may not be so simple. Hence, post-fixers seem a safer place again. This may introducing some coupling between features (one feature will need to predict that some other feature may apply some attribute) but those cases will be rather rare.

Removing disallowed attributes

This seems to be a common action which needs to be performed after most of model structure changes. We agreed already that this should be done by a post-fixer registered by the model itself. The advantage of doing this in a post-fixer is that it will be done once per all changes, not after every single atomic change (if it was done by the writer). This will be more performant and secure (because intermediate model states may be incorrect on purpose).

It's also important to remember that cleaning up attributes needs to be a recursive job – it needs to consider all descendants of changed nodes.

Selection position

Currently, there are two correct selections:

Additionally, there are some rules which we may want to introduce in the future:

Similarly to the structure/attrs checks, we need to be able to query whether selection position is correct but also to fix incorrect selection (which is currently done by Document#getNearestSelectionPosition()).

The biggest problem I can see here is what we should check. Always the entire selection? That will make the result hard to analyse (the problem with what false means). But checking individual ranges may not be enough (validating that table selection is rectangular, validating the direction), and checking positions is even more hopeless.

I think that we again need to be smart and implement true/false checks for a subset of information (and context) and allow post-fixing the rest.

TODO...

Defining schema rules

When designing this piece of API we need to take into consideration the following aspects:

While the first aspect is trivial, the second caused us a lot of pain in the past.

Functional requirements:

Conclusions:

Reinmar commented 6 years ago

List of potential followups: