automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.76k stars 467 forks source link

Type issue in RLEEncoder #448

Open Brian-McBride opened 2 years ago

Brian-McBride commented 2 years ago

Looking through the code, there is this line (repeated a few times)

https://github.com/automerge/automerge/blob/08a456cc2ddc79d8c2f35953db0f1fd8ed418d0f/backend/encoding.js#L695

A little cleanup in the decoder reduces the return types to string | number | null filtering down to the line in question where firstValue can be of a number or string type.

anyString >>> anyNumber has a lot of side effects. It really just depends if the value can be cast to a number. If it can be cast, it will work as expected. If it can't it will return 0. Returning zero isn't probably correct as sum will not be incremented as a number would. Worse though, if sum += sumShift is falsey, then we end up with sum adding a string to 0. Now we aren't a number but 0some-text-string.

Am I wrong here? Or does the RLEEncoder basically incorrectly generate the sum of strings when using sumValues?

Brian-McBride commented 2 years ago

I don't know what sum is used for downstream (yet). But my bet would be that the length of the string is what should be used? That or all strings should be zero (never allowing for automatic casting risking NaN)

Maybe something like a method:

  calcSum(value) {
    if (typeof value === 'number') return value;
    if (value === null) return 0;
    return value.length;
  }

Used in the three places where this same calculation is being done

if (sumValues) sum += sumShift ? this.calcSum(firstValue) >>> sumShift : this.calcSum(firstValue);

Or whatever the value of a string should be in this case.

Brian-McBride commented 2 years ago

By the way, the reason there is a string at all comes from readRawValue()

https://github.com/automerge/automerge/blob/08a456cc2ddc79d8c2f35953db0f1fd8ed418d0f/backend/encoding.js#L889-L903 Where readPrefixedString() returns a string type.

As a sidenote, readRecord and readRawValues both say "do not call from outside the class", but it is called outside the class (inside the RLEEncoder class).

ept commented 2 years ago

Hi @Brian-McBride, I believe sumValues should only ever be true on encoders/decoders with a number type (int or uint). It shouldn't ever be adding strings. We could make it throw an exception if you try to pass sumValues && this.type !== 'int' && this.type !== 'uint'.