Text transformation feature – dashes, "smart" quotes, etc.

ile commented 5 years ago

Would this be the correct plugin to handle things like:

Turn double dash (--) into – (endash)
Turn triple dash (---) into — (emdash)
In practice this would mean to turn double/triple dash + space into a wider dash.
https://en.wikipedia.org/wiki/Dash

Or

Turn "..." into “...”
Turn '...' into ‘...’ (or more "beautiful" quotation marks)

Or

If a text is selected (as in "painted") and then a quotation mark is typed, the selected text is wrapped in quotes.

If this would be the correct plugin, let's treat this issue as a feature request? Thank you.

scofalik commented 5 years ago

This topic was briefly discussed here: https://github.com/ckeditor/ckeditor5/issues/1366. The short answer is: it probably is possible to implement those transformations using what is already available in autoformat feature but those kinds of transformations weren't on our mind when we created InlineAutoformatEditing so I can't guarantee you that it will work flawlessly or that the integration will look good.

ile commented 5 years ago

Ok.

It would be nice to have a ckeditor-typography package, which would do something like this:

https://github.com/pnevyk/tipograph https://github.com/sapegin/richtypo.js

I know I could perhaps create it myself, and maybe I will, but if you did it, it would probably be correctly done from the beginning...

Reinmar commented 5 years ago

Agree. We actually planned this package and there might've even been a ticket for it (can't find it though). The autoformatting feature is about formatting, so it's a bit different topic. Text transformations would be a separate thing.

I'm moving this ticket to the main repo and confirming.

mlewand commented 5 years ago

We were unable to fit this feature in the current iteration, however we added mentions (#998) which included some APIs (most importantly text watcher) that will be helpful while implementing this feature. We're looking to implement it during next iteration.

jodator commented 5 years ago

@mlewand the text watcher will probably be moved to other place. Also it needs some work to make it more usable - right now it is focused on the Mention feature needs. This should be properly exposed in next iteration.

jodator commented 5 years ago

@mlewand Some insights about the feature:

What to ~change~ transform as we can go crazy here:
- dashes to endash/emdash
- ... to …
- quotes (language rules?)
  - "smart quotes of selected text"
  - example rules
  - should change according to (text's) language (follow up feature?)
- math related?
  - 2 x 3 to 2 × 3 (and similar for number)
  - <= to ≤
  - etc...
  - numbers thousands separator (kinda meh)
- spaces?
- other ligatures like
  - ?? !?, etc.
  - (c) to ©, (tm) to ™
Name of the feature (thus repo name) Text transformations? Typography?
How/what to configure:

Probably we should go with a set of default, most requested options: dashes, quotes, maybe some ligatures for some common things.

Other notes:
- it probably should only work for typing not selection change (ie TextWatcher in mentions acts on selection changes also) - so maybe some sort of flag or scoped events will be needed
- the feature should be easily extensible - in some simplest ie 'NY' -> 'New York' but it should also supports RegExps
Others implementations (also feature names):

The OpenOffice Writer has an AutoCorrect feature which does this and additional things (out of the scope of this feature): Selection_006

The Google docs have it under: Selection_007

Edit: MS Word have those features split to AutoCorrect and Math AutoCorrect : Screenshot 2019-05-10 13 52 16 Screenshot 2019-05-10 13 52 41

References:

jodator commented 5 years ago

While implementing a feature some questions camoe out:

Feature configuration namespace option A:
```
editor.config.set( 'textTransformation.transformations', [
    { from: '\\.\\.\\.', to: '…' },
] );
```
option B:
```
editor.config.set( 'typing.textTransformations', [
    { from: '\\.\\.\\.', to: '…' },
] );
```
I'm divided between those two: option B is when we're sure that the configuration will only consist one key while option A is more future-proof.
How to extend/mix/remove default configuration.

I'm pretty sure that whatever default configuration we figure out it not gonna suite everyone :) We can go with simple replace the config as other features. So defining new config will overwrite default. We could ease developers by exposing parts of default configuration as:

2a. Stings of groups (no import needed, some docs reading required)
```
ClassicEditor.create( foo, {
    textTransformation: {
        transformations: [
            'symbols',
            'dashes',
            'quotes', // or `quotes_en_us`, 'quotes_pl`, etc
            { from: 'CKS', to: 'CKSource' }, // custom
        ]
    }
}
```
2b. (may be additional to a) Name each default transformation and allow it as in 2a, ie, trademark, quotes_en_us_primary, emdash, etc.

2c. Export constants (meh I wouldn't go there - requires import)

2d. Getter / method on TextTransformation class (would require someone to write a plugin to extend a configuration so it also look like something weird).
Add support for to: option to be a callback? The RegExp alone is doing fine but one might like to have some special rules. (also might be follow of a base feature).
As we should be able to configure a feature by either String or RegExp then I think it is better to enforce appending end of string token ($) to strings from config.
```
const transformations = [
    { from: '\\.\\.\\.', to: '…' },
    { from: ' -- ', to: ' – ' },
    { from: ' --- ', to: ' — ' },
    // etc
];
```
looks better then:
```
const transformations = [
    { from: '\\.\\.\\.$', to: '…' },
    { from: ' -- $', to: ' – ' },
    { from: ' --- $', to: ' — ' },
    // etc
];
```
Rationale behind this is that we want to change typed text (thus at the end of string - before the caret) and not all text in a paragraph. Hope this makes sense. Can be overridden with RegExp.
Finally - what set of default transformations we would like to define?

It might be a follow-up as the feature should not rely on set of transformations. (ps.: to be tested with large set of text watchers - might be separate ticket also for TextWatcher alone).

mlewand commented 5 years ago

I like the idea of making it a part of the typing as it fits it very well. I think that even option B can be extensible while maintaining backward compatibility, so I wouldn't be worried about that.

Also note that typing pretty much implies text, we don't have to call it textTransformations but we can simplify it down to text.transformations.

Future format (if needed) might look like:

editor.config.set( 'typing.transformation', {
    transformations: [
        { from: '\\.\\.\\.', to: '…' },
    ],
    getActive: () => myAmazingCms.user.wysiwyg.textTransformationActive
} );

The only thing I don't like about the above is two subsequent transformation in the config chain.

Alternatively I'm thinking about API like that:

editor.config.set( 'typing.transformation', {
    // Override default transformations set.
    include: [
        'dashes',
        'quotes'
    ],
    getActive: () => myAmazingCms.user.wysiwyg.textTransformationActive
} );

editor.config.set( 'typing.transformation', {
    // Use the default set +trademark but without symbols.
    exclude: [
        'symbols'
    ],
    extra: [
        'trademark'
    ],
    getActive: () => myAmazingCms.user.wysiwyg.textTransformationActive
} );

WDYT?

Names for both groups and individual transformation looks very convenient. So option 2b sounds like a way to go. We'll need to pay attention to well document these.
I don't see a need to implement this at this point. If this feature will be requested we'll add it in future releases.
Not sure about this one.

Can be overridden with RegExp.

Could you elaborate more on that?

I understand this as if someone wants to match stuff beyond the caret he or she can skip the $ using raw regexp. E.g.
```
const transformations = [
    { from: /\.\.\./, to: '…' }, // match beyond caret
    { from: ' -- $', to: ' – ' }
];
```
I'll address this later.

jodator commented 5 years ago

@mlewand:

Ad 4: Instead of pattern you can provide RegExp instance. But matching text inside paragraph will not work as we assume that change is done at the caret so the $ is somewhat obligatory.

The stuff with RegExp would work but only if we allow custom callback for to option.

{ from /abc/g, ( wholeTextBeforeCaret ) => return doSomethingWith( wholeTextBeforeCaret ) }

but instead of /abc/g there would be a more complex regex.

Ad1. The include/exclude/extra is also an option to go. The only thing I don't get is the getActive() callback.

mlewand commented 5 years ago

I'm still not sure about that, I don't feel comfortable with doing magic stuff to provided regexp, in a sense that dev provides:
```
{ from /abc/g, ( wholeTextBeforeCaret ) => return doSomethingWith( wholeTextBeforeCaret ) }
```
And we convert it into:
```
{ from /abc$/g, ( wholeTextBeforeCaret ) => return doSomethingWith( wholeTextBeforeCaret ) }
```
As long as our documentation shows this in listing I think we're good with requiring tailing $ character in regexps.

For sure we can't input regular expressions as a string, because it will prevent us from adding simple string (literal) transformations, like the one you provided:
```
{ from: 'CKS', to: 'CKSource' }
```

regarding 1. Yea, I added this getActive() option just as a dummy example how can it evolve later on 🙂

jodator commented 5 years ago

I'm still not sure about that, I don't feel comfortable with doing magic stuff to provided regexp, in a sense that dev provides:

@mlewand no I don't wan't to convert RegExp objects - only the Strings which are regex patterns.

If someone would like to define:

{ from: 'CKS', to: 'CKSource' }

I'm convinced that we need to create a RegExp like this:

new RegExp( 'CKS$' );

so the text watcher (I mean the transformation) will trasnform:

Some CKS here and CKS there is CKS[]

to

Some CKS here and CKS there is CKSource[]

and not to:

Some CKSource here and CKSource there is CKSource[]

This makes sense because one can undo text transformation and thus do not need to worry of the erlier undone transformation to be transformed again.

jodator commented 5 years ago

So after F2F talk with @mlewand it indeed make no sense to mess with config as I described. The reason behind this was to simplify configuration but at the same time simple strings transformation such as cks -> CKSource & (tm) -> ™ should be done on callbakcks internally and should be simple string.endsWith() not full-blown RegExps. We should only mention this in the docs.

Reinmar commented 5 years ago

Could you sum up, guys, what's the plan for this feature? How will the configuration look like? And how it will be triggered?

jodator commented 5 years ago

Sure. So if you'd like to fiddle with current state of this there's a PR which implements (use ckeditor5 branch for mgit co as this covers multiple repos):

the text transformation is triggered by typing changes - before a caret
the typed transformation can be undone with undo (and redone with redo)
the data or the text that is not just before caret is not transformed
there's a minimal set of transformation (it yet to be decided which to add)
the transformation are not triggered on selection change (that'd be strange)
we'll start with English quotes but we could go with quotes rules for other languages

transformations are configured as:

typing: {
    transformation: {
        include: [
            // configured by feature - set of defaults. 
        ],
        exclude: [
            // remove transformations from defaults (see below to decide/implement)
        ]
        extra: [
            // user defined additional transformations
            { from: 'CKE', to: 'CKEditor' }, // string
            { from: /([a-z]+)@(example.com)$/, to: '$1.at.$2' } // regexp
        ]
    }
}

Still to decide/implement:

sane set of default transformation
named configurations (to be used in config's exclude and include)

For named configurations as see it as group name (ie. 'math' or 'quotes') and individual (like 'quotes_primary_en' or 'trademark'.

For group stubs checkout currently implemented (very limited just to showcase groups):

const DEFAULT_TRANSFORMATIONS = [
    // Common symbols:
    { from: '(c)', to: '©' },
    { from: '(r)', to: '®' },
    { from: '(tm)', to: '™' },

    // Mathematical:
    { from: '1/2', to: '½' },
    { from: '<=', to: '≤' },

    // Typography:
    { from: '...', to: '…' },
    { from: ' -- ', to: ' – ' },
    { from: ' --- ', to: ' — ' },

    // Quotations:
    // English primary.
    { from: buildQuotesRegExp( '"' ), to: '$1“$2”' },
    // English secondary.
    { from: buildQuotesRegExp( '\'' ), to: '$1‘$2’' }
];

As you can see there are some rules:

if it is stirng we're using ' ' in some transformations that should happen only after user type space (mostly dashes).
RegExp must include $ character to denote caret placement (to be described in the docs).

jodator commented 5 years ago

@Reinmar and a showcase:

Peek 2019-05-30 12-58

mlewand commented 5 years ago

This feature recently was merged to the master branch 🎉

For the usage API see the docs (or in case they're not deployed yet: the discussion above 🙂).

Sample screencast: Screencast showing parts of text being transformed as the end user types into the editor

ckeditor / ckeditor5

Text transformation feature – dashes, "smart" quotes, etc. #1490