client9 / stringencoders

Fast c-string transformations
MIT License
135 stars 51 forks source link

copying versions of C++ functions tolower(), toupper(), toprint() modify original string #18

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The code for these functions look like

    inline std::string tolower(const std::string& str)
    {
        std::string s(str);
        modp_tolower_copy(const_cast<char*>(s.c_str()), s.data(), s.size());
        return s;
    }

This as I understand supposed to copy original string into another variable and 
do all modifications in the copy of the original string. Unfortunately at least 
in some STL implementations this actually changes the original string (for 
example in STL included into gcc-4.4). This is because the library uses copy on 
write for STL strings and s.c_str() actually points to exactly same memory 
space as str.c_str() what means that any direct operations on s.c_str() change 
the original string. This is a bug in stringencoders because strictly speaking 
the standard doesn't allow for modifications  of the storage returned by 
c_str(). See for example 
http://www.cplusplus.com/reference/string/string/c_str/. Quote: "The returned 
array points to an internal location with the required storage space for this 
sequence of characters plus its terminating null-character, but the values in 
this array should not be modified in the program ...".

The below implementation still violates the standard but at least it doesn't 
break on gcc-4.4:

    inline std::string tolower(const std::string& str)
    {
        std::string s(str.size() + 1, '\0');
        modp_tolower_copy(const_cast<char*>(s.c_str()), str.data(), str.size());
        return s;
    }

Original issue reported on code.google.com by ilya.m...@gmail.com on 12 Jul 2011 at 11:11

GoogleCodeExporter commented 9 years ago
wow.  thanks for this.

I'll work on adding this to a test case then accept your patch.

Original comment by nickg@client9.com on 24 Feb 2012 at 5:24

GoogleCodeExporter commented 9 years ago
, in cxx_test.cc, the only *not* tested are the ascii copying functions.  Damn!

Original comment by nickg@client9.com on 24 Feb 2012 at 5:44

GoogleCodeExporter commented 9 years ago
ok tests added... I'm able to dup the problem

Original comment by nickg@client9.com on 24 Feb 2012 at 5:57

GoogleCodeExporter commented 9 years ago

fixed.  Minor bug in your proposed patch.

 std::string s(str.size() + 1, '\0'); will end up making a c++ string that thinks it has a null byte as part of the string (,i.e. wrong size() is returned)

I did the following:  seems to work and valgrind is happy.   What do you think?

    inline std::string toupper(const std::string& str)
    {
        std::string s(str.size(), '\0');
        modp_toupper_copy(const_cast<char*>(s.data()), str.data(), str.size());
        return s;
    }

Thanks again for the report.  Hope you are staying warm!

best,

nickg

Original comment by ngalbre...@etsy.com on 26 Feb 2012 at 4:56

GoogleCodeExporter commented 9 years ago

Original comment by nickg@client9.com on 26 Feb 2012 at 4:57