[BUGZILLA #16046] Unicode nuls are allowed in strings

MichaelChirico commented 4 years ago

For a long time, embedding a nul character in a string using \0 has thrown an error.

"\0" ## Error: embedded nul in string: '\0'

However, it is still possible to enter a nul character using Unicode syntax.

"abc\u0000def" ## [1] "abc"

R's behaviour should be consistent between the two specifications of nul. That is, attempting to create strings containing "\u0000" should throw an error.

METADATA

Bug author - Richard Cotton
Creation time - 2014-10-27 11:14:25 UTC
Bugzilla link
Status - CLOSED FIXED
Alias - None
Component - Low-level
Version - R-devel (trunk)
Hardware - All All
Importance - P5 minor
Assignee - R-core
URL -
Modification time - 2014-10-27 15:26 UTC

MichaelChirico commented 4 years ago

I agree that these two cases should be handled similarly. The reason for the difference is that they are currently handled by the string building code, and that's different for byte-sized chars versus wide chars, but the detection should probably happen earlier.

METADATA

Comment author - Duncan Murdoch
Timestamp - 2014-10-27 14:16:21 UTC

MichaelChirico commented 4 years ago

Fixed in R-devel; will port to R-patched after 3.1.2 is released.

METADATA

Comment author - Duncan Murdoch
Timestamp - 2014-10-27 15:26:52 UTC

MichaelChirico / r-bugs