From discussion in PR #139. Several of our text reading functions are made more complicated by attempting to serve both the string and clob cases. This minimizes duplicate code but makes the contract of each function much more complicated.
Making a note for future cleanup: I think that this code path is an instance of issue #48. We're using a string builder as a byte buffer; this works but is pretty confusing for readers. The end result isn't required to be a valid string.
In this case, we write the bytes of current rune.
This is a bit muddy to me; when isClob is true, the current rune, c, is guaranteed to be a single byte of an unknown encoding.
Clob cannot hold non 7-bit ASCII characters and to avoid truncating a rune beyond 127, we store the bytes in the string builder.
The text clob we're reading from can only hold 7-bit ASCII characters, but at this point in the code that's no longer relevant. We've already read the next byte (escaped or otherwise) from that clob; someone reading this function's documentation will be concerned with the contents of c, not its original format.
// If `isClob` is false, we write the UTF-8 encoding of `c` into the string builder. When
// `writeCharToStringBuilder` returns, `sb` will contain a valid string.
// If `isClob` is true, `c` is a single byte of an unknown encoding. We will write that byte to
// the string builder as-is. When `writeCharToStringBuilder` returns, the contents of `sb`
// must be treated as a byte array of unknown encoding.
Now that I've spent some more time with this, I do think we should refactor it. The cognitive burden is pretty high. It doesn't need to block this merge, though. I'll open a ticket to track it.
From discussion in PR #139. Several of our text reading functions are made more complicated by attempting to serve both the
string
andclob
cases. This minimizes duplicate code but makes the contract of each function much more complicated.Making a note for future cleanup: I think that this code path is an instance of issue #48. We're using a string builder as a byte buffer; this works but is pretty confusing for readers. The end result isn't required to be a valid string.
This is a bit muddy to me; when
isClob
istrue
, the current rune,c
, is guaranteed to be a single byte of an unknown encoding.The text clob we're reading from can only hold 7-bit ASCII characters, but at this point in the code that's no longer relevant. We've already read the next byte (escaped or otherwise) from that clob; someone reading this function's documentation will be concerned with the contents of
c
, not its original format.Now that I've spent some more time with this, I do think we should refactor it. The cognitive burden is pretty high. It doesn't need to block this merge, though. I'll open a ticket to track it.
Originally posted by @zslayton in https://github.com/amzn/ion-go/pull/139#r473151019