Closed msamsel closed 1 year ago
For me, it looks like you create a special class to write some logic using special syntax. I am not sure if it simplifies the code or complicates it even more. check
is just an "if", exec
is just a "then", addFilter
does not do much more then just executing one part after another. Also, I do not know how activationTrigger
suppose to work on the view
structure, which is a tree.
I think, that instead of creating an uber-tool to define the whole logic, you should create a bunch of helper to simplified parts of logic which repeat or is not easy to read.
You compare it to postFixers
but note that the only reason, why we have postFixers
is to restart all fixes defined by different plugins whenever one of then change something. We would use simple events if it would not be the case.
Can you link the code you want to simplified using this tool?
@pjasiun So the base idea was here: https://github.com/ckeditor/ckeditor5-paste-from-office/pull/62#pullrequestreview-263679141 - to group the logic for different office suites / programs in separate classes (Normalizers
) - those packs two things:
activationTrigger()
)Normalizer.exec()
).Now on top of it, if I get this right, @msamsel is proposing to rewrite logic behind content normalization itself as currently, you cannot reuse some logic from Word input processing in Google docs processing.
Now how to re-use this logic is a topic here.
A few thoughts from my side.
First of all, I don't feel that some sugar syntax is necessarily wrong. I am fine with defining those filters this or a similar way. But maybe having new classes is unnecessary. As PJ mentioned, this might be a helper function that hooks a callback to something.
Actually, PJ, you mention the original idea behind post-fixers and this might be the case here too. After you normalize some content, it might appear that it needs/can be further normalized. This is just an idea without an example though.
One thing I simply don't understand is why we would need activationTrigger
. This seems like a not-needed complication. When you have a view element that is incorrect in some way, it doesn't really matter if you got it from Google or MS Docs or whatever, right? Can you post an example where it matters?
Some of the examples are also something that we wanted and planned to solve in a different way. For example, converting view font-weight:700
to bold was supposed to be done through view to model conversion, by specifying appropriate converters. Of course, it could also be done thanks to view normalization. I could even agree that view normalization might be more clear/understandable to somebody than having multiple views converting to the same model values. But for those examples, it is adding an additional step, which we already had a different idea for.
I'd maybe still see some usage for this, though. Things like expanding some CSS styles:
margin: 10px
to margin-left: 10px; margin-right: 10px; margin-top: 10px; margin-bottom: 10px
. This is something that MAYBE would be easier to be done here than in conversion.
Now on top of it, if I get this right, @msamsel is proposing to rewrite logic behind content normalization itself as currently, you cannot reuse some logic from Word input processing in Google docs processing.
This normalization might make sense for changing things like <span class="mso-list ... ">
but still, are there any conflicting code between Google/MS pastes? So that you need to keep them separated?
Still, it can be just a function which gets the view structure and normalizes it.
are there any conflicting code between Google/MS pastes? So that you need to keep them separated?
I think that we want to know which filter
s use for each content. I'd rather don't want to run all the things for all the content types. At the moment the PFO only runs for Word input (so this activationTrigger()
is already there right now it is a basic programing contruct - you know if
;) (sorry for an inside joke) ).
As for having two classes: one that normalize Word and one that normalize Google Office input - I think that it is OK. I miht be wrong after we check which filters are used by both Word and Google Office but at hte beginnning why don't have clear view which helpers (filters) are used in each of them? Also normalizing input form one tool might be less consuming then from other so why filter for everything if we don't have to?
@msamsel: after sam talks with @pjasiun about the potential problems with fixing content it looks like it might be too early to propose general API for filters. As each filter must operate on whovie view - considering structural changes, moving things around, checking parents/siblings - I think that the filter methods should receive current DocumentFragment
and modify it using UpcastWriter
. The third parameter might be whole dataTransfer
object (as some transforamtion might use it).
At the moment we cannot tell if any future paste-from-something will not have conflicting code so I'd keep them separated.
ATM, if the conversation doesn't go other direction I would have two simple classes that will import common filters if needed. The filters probably should have similar (same) definition to make them reausable.
import removeBold from './filters/googledocs/removebold.js';
import { saveRTFImages } from './filters/images.js'
import fixLists from './filters/fixlists.js';
import foo from './filters/foo.js';
import bar from './filters/bar.js';
/**
* @implements module:paste-from-office/pastefromoffice~Normalizer
*/
class GoogleDocsNormalizer {
activationTrigger( html ) {
return html.contains( 'foo' );
}
execute( documentFragment, writer, dataTransfer ) {
removeBold( documentFragment, writer );
saveRTFImages( documentFragment, writer, dataTransfer );
fixListsBold( documentFragment, writer );
// Some filters might be so simple that you can call them in a loop (I think :) )
// ... get nodes
for( const node of nodes ) {
foo( node, writer );
bar( node, writer );
}
}
So it seems that each filter might have the same code duplicated (e.g. iteration part) - but I think that we're fine with this right now as this is safer then post-fixer
approach that runs all post-fixer if some other post-fixer returns true. Here each filter would iterate over a document fragment once.
I agree with @pjasiun. If this tool doesn't do anything more than being a big if ( check() ) exec()
runner then it feels like unnecessary structure for something that will benefit from less of structure. No structure means no limitations to how filters are applied. This means you can choose the best solution for the job.
Therefore, a set of (somehow standardised?) helper functions executed one by one, just like @jodator shown in the last comment sounds like a better plan (at least for now).
The only reason that I see for developing such a filter executor would be the performance. Having every helper function doing the whole tree-traversal is far from being optimal. Here, a filter executor could help by optimizing the places where it applies its filters. However, @msamsel you identified the biggest problem with such a tool:
For example, if there will be some kind of unwrap method, which will move element's children to element's parent, then elements added before it, won't be processed again as those will be located in the already processed structure. That is why a safer approach could be to apply each filter separately for entire input data recursively.
That's exactly the issue with CKEditor 4's HTML processor. It comes with unclear limitations to what you can do in its filters and when you cross those limitations then suddenly some of your filters will not be applied.
The other problem was indeed related to priorities. When you have 20+ filters, as in the old PFW implementation, this starts to play a role. At some point, limited by the filter executor's framework, you may start creating relations between those filters. If one filter expects some other filter to be executed beforehand, you start prioritising them and it makes the whole thing even more complex.
There has been no activity on this issue for the past year. We've marked it as stale and will close it in 30 days. We understand it may be relevant, so if you're interested in the solution, leave a comment or reaction under this issue.
We've closed your issue due to inactivity over the last year. We understand that the issue may still be relevant. If so, feel free to open a new one (and link this issue to it).
Description
During work on paste from google docs, there emerge an idea of a
ContentNormalizers
. All further concepts are related toengine#view
and input data from the clipboard. As input data you should understand data obtained from inputTransformationEvent.The Content Normalizer is a class which provides an object instance. This instance is dedicated to transforming input data came from a specific type of source: for example MS Word, Google Docs, Online Office, etc. Transformation is defined with filters, which are registered for each normalizer separately. One filter might be used in multiple normalizers. Content Normalizer contains logic which decides if its filters should be applied to a given input data.
Examples of filters purpose:
font-weight:700;
tofont-weight:bold;
<b>
element. (Movesnode.children
tonode.parent
and removesnode
)Content Normalizers transforms View on its own layer and might be similar to the concept of post-fixers. The idea is to have a mechanism, which is easy to maintain and extend, to process pasted data from different sources. Its role is to transform quirks from different office-like application to a standard shape, which later will be nicely upcast to the model.
Examples
Below you can find a few examples of how the filter's definition could look like. First is defined
activationTrigger
, which determines for what type of content normalizer should be applied. Then are registered some filters for given normalizer. In further steps inside PFO, wheninputTransformation
event is processed by PFO, the input data are passed to each normalizer. If input data fulfils the condition of the activation trigger, then normalizer is activated and data are transformed with filters.check
- filter object containes 2 propertiesexec
andcheck
normalizer.addFilter( { check: 'span', exec: ( writer, node ) => { if ( node.getStyle( 'font-weight' ) === '700' ) { writer.setStyle( 'font-weight', 'bold', node ); } } } );
normalizer.addFilter( { check: node => node.is( 'element' ) && node.hasClass( 'foo' ), exec: ( writer, node ) => { writer.removeClass( 'foo', node ); writer.setAttribute( 'foo', 'bar', node ); } } );
normalizer.addFilter( ( writer, node ) => { if ( node.is( 'element:span' ) && node.getStyle( 'font-weight' ) === '700' ) { writer.setStyle( 'font-weight', 'bold', node ); } } );
normalizer.addFilter( ( writer, node ) => { if ( !node.is( 'element' ) || !node.hasClass( 'foo' ) ) { return; }
} );
addFullContentFilter
.normalizer.addFullContentFilter( data => { // <--- here is passed entire input data object const html = data.dataTransfer.getData( 'text/html' );
} );
return normalizer;
All thoughts about those Content Normalizers are welcome.