Closed calibr closed 4 years ago
Currently encoding a string with a BOM and then decoding it produces a string with different length, consider the example:
import * as string from "lib0/string.js" const stringWithBom = "bom" console.log(stringWithBom.length) // prints 4 const decodedString = string.decodeUtf8(string.encodeUtf8(stringWithBom)) console.log(decodedString.length) // prints 3
This happens because on the decoding stage the BOM gets removed.
This length reduction seems to be causing ID skew on the decoding client and in practice I'm getting a structure that dependes on itself:
ItemRef { _missing: [ ID { client: 3686199576, clock: 22 } ], id: ID { client: 3686199576, clock: 22 }, left: ID { client: 3686199576, clock: 22 }, right: null, parentYKey: null, parent: null, parentSub: null, content: ContentEmbed { embed: { linebreak: 's' } }, length: 1 }
This probably can lead to missing content because the dependency will never resolve.
So I added the ignoreBOM: true option to the TextDecoder constructor to not remove BOM from strings and preserve their length.
ignoreBOM: true
Good catch @calibr ! Thanks for the PR.
I'll add some tests that verify that this issue is fixed in all encoding/decoding methods and then make a new release.
Currently encoding a string with a BOM and then decoding it produces a string with different length, consider the example:
This happens because on the decoding stage the BOM gets removed.
This length reduction seems to be causing ID skew on the decoding client and in practice I'm getting a structure that dependes on itself:
This probably can lead to missing content because the dependency will never resolve.
So I added the
ignoreBOM: true
option to the TextDecoder constructor to not remove BOM from strings and preserve their length.