automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.75k stars 466 forks source link

Polymorphic insertAt for Text #350

Open lightsprint09 opened 3 years ago

lightsprint09 commented 3 years ago

This could resolve #326.

It introduces a the possibility to pass normal strings into insertAt.

Automerge.change(s1, doc => doc.text.insertAt(0, 'string'))

If one passes a string it gets split up into characters inside insertAt and processed as as list of single characters

nemanja-tosic commented 3 years ago

What is the expected behavior for text.insertAt(0, 'abc')?

One test specifying "should support inserting a single element" is asserting that the text equals abc: https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L327-L331

This test is in contradiction with the PR and should fail, and is only passing because there is an assertion on the stringified value of text, which will be 'abc' in either case - single element or multiple chars.

Another test is asserting that the text object contains an with the value of abc: https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L355-L361

As a corollary to this, what should happen if i do:

text.insertAt(0, 'abc', 'abc', 'abc')

The PR does not cover this case and the outcome is

Text = ['a', 'b', 'c']

where the latter two strings are rejected without an error.

lightsprint09 commented 3 years ago

As a corollary to this, what should happen if i do:

text.insertAt(0, 'abc', 'abc', 'abc')

The PR does not cover this case and the outcome is

Text = ['a', 'b', 'c']

where the latter two strings are rejected without an error.

What do you expect to happen. Should we throw an error or support it it with a result of: Text = ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']

lightsprint09 commented 3 years ago

What is the expected behavior for text.insertAt(0, 'abc')?

One test specifying "should support inserting a single element" is asserting that the text equals abc: https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L327-L331

This test is in contradiction with the PR and should fail, and is only passing because there is an assertion on the stringified value of text, which will be 'abc' in either case - single element or multiple chars.

I am not sure about that. I don't know how smart TypeScript converts types

nemanja-tosic commented 3 years ago

I'm not a maintainer so i wouldn't know.

I'm pointing out how the tests prior to the PR were handling the insertion of strings and it appears to me that the tests weren't asserting the same thing across the board. For example, https://github.com/automerge/automerge/blob/performance/test/typescript_test.ts#L327-L331, is specifying that inserting a string leads to an insertion of a single element, however the assertion doesn't check for that. This PR inserts a string one character at a time. So, it feels that there is a conflict between the previous specs and the behavior this PR introduces, or it's me who is misunderstanding the specs.

Given that, whether or not we throw an error is dependent upon the question "What is the expected behavior for text.insertAt(0, 'abc')?" :)

josharian commented 3 years ago

The discussion in #326 seems to have landed at:

we have a single insertAt method as now, but have it accept an array of characters or a string (which gets split) as second argument

That doesn't seem to be what this PR implements. That plan should also make these calls unambiguous, because the sole arg (it's not variadic any more) is either a string or an array.

lightsprint09 commented 3 years ago

@josharian

So the call side would look like the following?

text.insertAt(0, ["a", "b", "c"])

or

text.insertAt(0, "abc")
josharian commented 3 years ago

That's how I read it, yes.

lightsprint09 commented 3 years ago

I close this for now

pvh commented 3 years ago

Thanks for a good discussion of the situation and especially @nemanja-tosic and @josharian for review. We've definitely already agreed to do something like this so I hope you'll find a way to clean up this patch so we can merge it, @lightsprint09. If you need help, I have been meaning to work on this so just let me know.

ept commented 3 years ago

Thanks for taking the initiative for this change! I've updated the PR to allow to forms of Text.insertAt():

// means Automerge should split the string into individual characters
text.insertAt(index, 'hello')

// takes a string that has already been split;
// Automerge does not do any further splitting.
text.insertAt(index, ['h', 'e', 'l', 'l', 'o'])

Passing an array rather than using varargs avoids @josharian's problem of hitting the maximum number of arguments. Also, because the array form does not perform any further splitting, it is suitable for app developers who want to use custom splitting logic (e.g. splitting by Unicode grapheme cluster).

The API also needs to support inserting non-string values (such as marker objects for rich text). For now I've set it up so that the following two are equivalent:

text.insertAt(index, {bold: true})
text.insertAt(index, [{bold: true}])

It seems weird that those two calls do the same thing, but I'm not sure what a better alternative would be. One option would be to only allow the latter (where the control object appears inside an array), but that's a bit clumsy to use, and would be inconsistent with the API for inserting items into lists. What do folks think about this? Happy to have a little bit of bikeshedding…

pvh commented 3 years ago

I think I lean towards only supporting the latter: it either takes a string or an array of items to insert. All else is incorrect usage. This would prevent problems caused by accidentally passing other unrelated objects in.

ept commented 3 years ago

@pvh If we require any non-string values to be wrapped in an array, should we also change the API for lists to be consistent? That API currently works like this:

let x = Automerge.change(Automerge.init(), doc => {
  doc.list = [1, 2, 4]
})
x = Automerge.change(x, doc => {
  doc.list.insertAt(2, 3) // insert the number 3 at index 2
})
// x is now { list: [ 1, 2, 3, 4 ] }

To be consistent with the Automerge.Text API we should then also change the list API so that you have to call doc.list.insertAt(2, [3]) if you want to insert the number 3 into a list — but that seems a bit weird to me. Moreover, it raises the question of what happens when you want to insert a nested list as a list element — would you then have to write doc.list.insertAt(2, [[...]])? This doesn't seem very nice. But having the Text and list APIs inconsistent with each other is not nice either.

crn420 commented 3 years ago

polymorphic-insertAt

crn420 commented 3 years ago

polymorphic-insertAt

crn420 commented 3 years ago

@lightsprint09 `

crn420 commented 3 years ago

350

crn420 commented 3 years ago

Duplicate of #

LiraNuna commented 2 years ago

What is the expected behavior when splitting emojis? Given the code, I think it will split 👨‍👩‍👦‍👦 into '👨', ZWJ, '👩', ZWJ, '👦', ZWJ, '👦', is this desired?

ept commented 2 years ago

Yes, this code will split text into Unicode scalar values, which means some emoji will get broken up. I have previously argued to use grapheme cluster segmentation, which would keep such emoji together as a single string. However, we discussed this recently in automerge/automerge-rs#330 and came to the conclusion that it's okay to split into Unicode scalar values.