Open CC007 opened 9 months ago
To my knowledge, if you know that the resulting binary is 8-bit aligned, you can also skip the = character at the end of the Base64 string
So you would get 83, 90, 89, 67, 48, 103
instead of 83, 90, 89, 67, 48, 103, 61, 61
Base64 encoding is a 6-bit encoding scheme, but since only F8
-FF
are reserved, you could get away with using a 7-bit encoding, like ASCII (with the input padded to a multiple of 7 bits, just like is done in Base64 for the 6-bit encoding).
The only thing would be that you can't cleanly view the characters, which also hinders the ability to copy. I don't know if that's an important consideration though.
Idea
From a comment on your Youtube video by Rik Schaaf (me) (https://www.youtube.com/watch?v=tb_70o6ohMA&lc=Ugzsfj_OUAK4s_IYaNZ4AaABAg):
Example
So:
Would translate to:
So in essence, without prefix you get UTF-8 encoded data and with the \FB prefix you get Base64 encoded data (ASCII and UTF-8 compatible, to my knowledge)
What is this addition trying to do
The advantage from this encoding addition is that non-unicode characters could also be represented without risk of collisions, including the RSV special characters themselves.
Another advantage is that some data types can be stored more efficiently, like numbers and dates.
What is this addition NOT trying to do (but what could be added in a separate issue)
This is not a change to add the data types themselves to RSV. This additional special character only signifies the encoding, not the datatype, so you wouldn't know if the data represents an integer, timestamp, float, etc., just like you wouldn't know this with the current implementation. This is still left to the program that is using the RSV file.
If the data type would have to be derived from this binary data, the base64 value could be prefixed (after the \FB) by a string surrounded by non-base64 characters, to signify the data type, like
(i32)
for 32-bit integers. Example:...which would represent a single integer (int32) value that equals 1234567890. Or you could use something more simple, but restrictive typing system, that uses a single non-base64 character to define the type, followed by a single character for the size.
...where # defines an integer and 4 defines a size of 4 bytes (32 bit):
1234567890
...where ~ defines a floating point value and 4 defines a size of 4 bytes (32 bit):
3.141592...
This is out of scope for this issue though.Considerations
With this addition, the name isn't really accurate anymore, so would this be RBSV (Rows of Binary or String Values)?