Kevin-Jin / mmap

Forked from https://r-forge.r-project.org/scm/?group_id=648
1 stars 1 forks source link

Do not explicitly pad strings with spaces #2

Closed Kevin-Jin closed 7 years ago

Kevin-Jin commented 7 years ago

The only way to remove the added spaces is trimws(x, "r"). However, in some data sets, trailing whitespace at the end of a string may be meaningful.

Since R doesn't allow for NUL characters in strings, I propose defining a new class "padded.character" on top of the raw type and modifying make.fixedwidth() accordingly. Each atomic raw cell should have a width of max(max(nchar(x)), 1) + 1 to keep space for the null-terminating character. By making the field at least 2 bytes long, we can encode NA by placing a non-null character after the first NUL (i.e. '\0') that signals the end of the string.

S3 methods as.character.padded.character() and print.padded.character() should be overridden to recognize the special NA sequence and to remove NUL characters.