Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Overeager setting of eof() #9696

Closed Quuxplusone closed 13 years ago

Quuxplusone commented 13 years ago
Bugzilla Link PR9335
Status RESOLVED FIXED
Importance P normal
Reported by Christopher Jefferson (chris@bubblescope.net)
Reported on 2011-02-26 05:35:41 -0800
Last modified on 2011-05-13 13:44:24 -0700
Version unspecified
Hardware PC All
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
I believe the following code shows that libc++ is over-eager setting eofbit
when reading chars. Certainly the behavior disagrees with libstdc++, and my
very quick reading of the stream parts of the standard.

#include <sstream>
#include <cassert>

using namespace std;

int main(void)
{
    istringstream ss("1");
    char c;
    ss >> c;
    assert(!ss.eof());
}
Quuxplusone commented 13 years ago

[istream]/p3 says in summarizing all of the istream extractors that if rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_- base::failure (27.5.4.3), before returning.

In your example the gptr() must be bumped by one after extracting '1', and that increment hits the eof. I believe not setting eofbit at this point would be in contradiction to [istream]/p3.

Is this behavior causing problems? If so, perhaps we should open an issue.

Quuxplusone commented 13 years ago

This seems to just be a problem with extracting a single char.

I don't see (but I am not a stream expert!) why it is necessary to have sgetc() ever return traits::eof(). Reading a single character from the stream will get the character, then operator>> should return? In this way char is different to int, etc., as it isn't necessary to look ahead to check you have reached the end of whatever you want to parse.

Quuxplusone commented 13 years ago

Doing only sgetc() isn't sufficient because then the next "ss >> c" would read the same character again. For every character extracted the "get pointer" must be incremented past that character. This is where sbumpc comes in.

Indeed, if this example went down to the streambuf level and called sgetc(), it would return '1' over and over and never set eofbit.

Quuxplusone commented 13 years ago
Looking at libstdc++ (I'm not positive this is right), it uses 'gbumpc' to grab
the character. If gbumpc returns a character then that's fine, else if it
returns eof then eofbit is set.

The difference as I understand it is that libc++ proactively goes out of it's
way to set eofbit, consider the following code:

void f1(void)
{
    istringstream ss("123");
    char c;
    for(int i = 0; i < 4; ++i)
    {
        char c;
        ss.get(c);
      cout << c << ":";
      cout << ss.eof() << ",";
    }
    cout << "\n";
}

void f2(void)
{
    istringstream ss("123");
    char c;
    for(int i = 0; i < 4; ++i)
    {
      cout << ss.rdbuf()->sbumpc() << ":";
      cout << ss.eof() << ",";
    }
    cout << "\n";
}

int main(void)
{ f1(); f2(); }

g++ prints:

1:0,2:0,3:0,3:1,
49:0,50:0,51:0,-1:0,

So, it doesn't set eof until it a call to sbumpc returns eof.

libc++ prints:

1:0,2:0,3:1,3:1,
49:0,50:0,51:0,-1:0,

So it outputs eof one step earlier.

Either way, I believe (but we should check) that all compilers agree with g++
(else they would have lots more failures in boost), so that's probably in
practice what should be implemented, and fixed in the standard. Unless of
course there is some workaround buried deep in boost I haven't found yet.
Quuxplusone commented 13 years ago
I've reread the appropriate parts of the standard.  I'm just not seeing where
char should be treated differently than any other type in this regard.  All
extraction is done using istreambuf_iterator, whether it is int, double, bool,
or char.  libc++ isn't going out of its way to treat char differently, or to
set eofbit.

For every type, every time the istreambuf_iterator is incremented it is checked
against an eof istreambuf_iterator and if true, eofbit is set.  Here's the code
for char:

template<class _CharT, class _Traits>
basic_istream<_CharT, _Traits>&
operator>>(basic_istream<_CharT, _Traits>& __is, _CharT& __c)
{
#ifndef _LIBCPP_NO_EXCEPTIONS
    try
    {
#endif  // _LIBCPP_NO_EXCEPTIONS
        typename basic_istream<_CharT, _Traits>::sentry __sen(__is);
        if (__sen)
        {
            typedef istreambuf_iterator<_CharT, _Traits> _I;
            _I __i(__is);
            _I __eof;
            if (__i != __eof)
            {
                __c = *__i;
                if (++__i == __eof)
                    __is.setstate(ios_base::eofbit);
            }
            else
                __is.setstate(ios_base::eofbit | ios_base::failbit);
        }
#ifndef _LIBCPP_NO_EXCEPTIONS
    }
    catch (...)
    {
        __is.__set_badbit_and_consider_rethrow();
    }
#endif  // _LIBCPP_NO_EXCEPTIONS
    return __is;
}

Here's the code for long:

template <class _CharT, class _InputIterator>
_InputIterator
num_get<_CharT, _InputIterator>::do_get(iter_type __b, iter_type __e,
                                        ios_base& __iob,
                                        ios_base::iostate& __err,
                                        long& __v) const
{
    // Stage 1
    int __base = this->__get_base(__iob);
    // Stage 2
    char_type __atoms[26];
    char_type __thousands_sep;
    string __grouping = this->__stage2_int_prep(__iob, __atoms, __thousands_sep);
    char __a[__num_get_base::__num_get_buf_sz] = {0};
    char* __a_end = __a;
    unsigned __g[__num_get_base::__num_get_buf_sz];
    unsigned* __g_end = __g;
    unsigned __dc = 0;
    for (; __b != __e; ++__b)
        if (this->__stage2_int_loop(*__b, __base, __a, __a_end, __dc,
                                    __thousands_sep, __grouping, __g, __g_end,
                                    __atoms))
            break;
    if (__grouping.size() != 0 && __g_end-__g < __num_get_base::__num_get_buf_sz)
        *__g_end++ = __dc;
    // Stage 3
    __v = __num_get_signed_integral<long>(__a, __a_end, __err, __base);
    // Digit grouping checked
    __check_grouping(__grouping, __g, __g_end, __err);
    // EOF checked
    if (__b == __e)
        __err |= ios_base::eofbit;
    return __b;
}

With respect to setting eofbit, they are identical.  Here is test code
demonstrating this generic behavior:

#include <sstream>
#include <cassert>

template <class T>
void
test1()
{
    std::istringstream ss("1");
    T t;
    ss >> t;
    assert(ss.eof());
};

template <class T>
void
test2()
{
    std::istringstream ss("1 ");
    T t;
    ss >> t;
    assert(!ss.eof());
};

int main()
{
    test1<bool>();
    test1<int>();
    test1<double>();
    test1<std::string>();
    test1<char>();

    test2<bool>();
    test2<int>();
    test2<double>();
    test2<std::string>();
    test2<char>();
}

Could you point me towards the boost code that is having problems?

Thanks.
Quuxplusone commented 13 years ago
There are two places I believe are having problems.

lib/test/uuid/test_io.cpp has:

 {
        uuid u;

        std::stringstream ss;
        ss << "00000000-0000-0000-0000-000000000000";
        ss >> u;
        BOOST_TEST_EQ(u, u1);

        ss << "12345678-90ab-cdef-1234-567890abcdef";
        ss >> u;
        BOOST_TEST_EQ(u, u3);
    }

Which fails (I believe) because eof() is getting set after the first parse.

There are also problems in spirit. For example in
libs/spirit/test/qi/match_manip2.cpp. The problem arises in match_manip.hpp
(same directory):

{
     std::istringstream istrm(toparse);
     istrm.unsetf(std::ios::skipws);
     istrm >> mm;
     return istrm.good() || istrm.eof();
}

The problem is that when we try to parse "abc" as 'char('a') >> char('b') >>
char('z')', then the querying the last letter causes 'eof' to be activated. It
was in spirit that I noticed the inconsistency, I'm not 100% sure this is
causing it, trying to trace through spirit code is not for the faint of heart!

Looking again at 27.7.1.1 [istream] p3, I wonder about the code:

__c = *__i;
if (++__i == __eof)
  __is.setstate(ios_base::eofbit);

If this is changed to:

__c = *__i;
++__i;

Then I get the same behaviour as g++.

I believe the implementation for char should just be

try
{
  int_type c = this->rdbuf()->sbumpc();
  if(c == eof) set eofbit;
  return c;
}
catch(...)
{ set failbit; }

The difference between char and everything else is that you have to look one
character past the end of the int we are reading to make sure we have reached
the end of it.