Open SPC-code opened 1 year ago
@SPC-code would something like these work for you?
fun String.encodeToAsciiByteString(): ByteString {
val bstr = this.encodeToByteString()
if (bstr.size != length) throw IllegalArgumentException("String is not an ASCII string: $this")
return bstr
}
fun String.encodeToAsciiByteString(): ByteString {
return buildByteString(length) {
this@encodeToAsciiByteString.forEach {
if (it.code > Byte.MAX_VALUE || it.code < Byte.MIN_VALUE) {
throw IllegalArgumentException("Character could not be encoded using ASCII: $it")
}
append(it.code.toByte())
}
}
}
I've done it differently: https://github.com/SciProgCentre/dataforge-core/blob/2aba1b48dce011906231ba5ab67353f9901cadfa/dataforge-io/src/commonMain/kotlin/space/kscience/dataforge/io/ioMisc.kt#L12-L19
But the important thing to have this API. Implementation could change in future.
Plus an option for extended ASCII would be good to have.
@lppedd could you please elaborate what do you mean under "extended ASCII"?
@fzhinkin I meant the standard ASCII + the other 128 code points.
But I forgot that the extended part (the additional 128) is not standard, although maybe the general consensus is on the Windows-1252 or ISO 8859-1 charsets.
I believe that such scenarios require explicit encoding routine that will use Windows-1252
or some other 8-bit encoding.
Silently falling back to some default charset encoding is not a great option as it allows to encode potentially incorrect data without noticing a problem.
And at the moment there are no particular plans on supporting charset encodings other then UTF-8.
In protocol parsing/writing we frequently need to operate with one-byte encoded strings that a expected to consist only of ASCII characters. Please add an ability to convert a string literal to a
ByteString
using character-to-byte transformation with check for non-ASCII characters.