dmonad / lib0

Monorepo of isomorphic utility functions
MIT License
353 stars 63 forks source link

string.js: add ignoreBOM to TextDecorder #7

Closed calibr closed 4 years ago

calibr commented 4 years ago

Currently encoding a string with a BOM and then decoding it produces a string with different length, consider the example:

import * as string from "lib0/string.js"

const stringWithBom = "bom"
console.log(stringWithBom.length) // prints 4

const decodedString = string.decodeUtf8(string.encodeUtf8(stringWithBom))
console.log(decodedString.length) // prints 3

This happens because on the decoding stage the BOM gets removed.

This length reduction seems to be causing ID skew on the decoding client and in practice I'm getting a structure that dependes on itself:

ItemRef {
  _missing: [ ID { client: 3686199576, clock: 22 } ],
  id: ID { client: 3686199576, clock: 22 },
  left: ID { client: 3686199576, clock: 22 },
  right: null,
  parentYKey: null,
  parent: null,
  parentSub: null,
  content: ContentEmbed { embed: { linebreak: 's' } },
  length: 1
}

This probably can lead to missing content because the dependency will never resolve.

So I added the ignoreBOM: true option to the TextDecoder constructor to not remove BOM from strings and preserve their length.

dmonad commented 4 years ago

Good catch @calibr ! Thanks for the PR.

I'll add some tests that verify that this issue is fixed in all encoding/decoding methods and then make a new release.