boostorg / wave

Boost.org wave module
http://boost.org/libs/wave
21 stars 49 forks source link

Segmentation fault when "#pragma\n" is encountered #188

Closed mfep closed 8 months ago

mfep commented 11 months ago

System info

Actual behaviour

Preprocessing C++ source code with a #pragma statement immediately followed by a newline (\n) using Wave crashes the program with a segmentation fault.

Expected behaviour

The preprocessor lets the #pragma statement through to the output as unrecognized. Or throw a recoverable error.

Minimal reproducer

Minimal reproducer based on the "Quick start" section in the documentation:

#include <boost/wave/cpplexer/cpp_lex_iterator.hpp>
#include <boost/wave.hpp>

#include <iostream>
#include <string>

int main()
{
    std::string input =
        R"(int main()
{
#pragma
    return 0;
}
)";

    typedef boost::wave::cpplexer::lex_iterator<
        boost::wave::cpplexer::lex_token<>>
        lex_iterator_type;
    typedef boost::wave::context<
        std::string::iterator, lex_iterator_type>
        context_type;

    context_type ctx(input.begin(), input.end());

    context_type::iterator_type first = ctx.begin();
    context_type::iterator_type last = ctx.end();

    try
    {
        while (first != last)
        {
            std::cout << (*first).get_value();
            ++first;
        }
    }
    catch (const boost::wave::preprocess_exception &ex)
    {
        std::cout << "ERROR: " << ex.description() << std::endl;
        return -1;
    }
}
hkaiser commented 11 months ago

Thanks for reporting, I'll try to have a look.

jefftrull commented 11 months ago

Thanks for the very clear bug report.

It looks like the code that implements pragmas assumes there is always at least one token (the space) after #pragma and when it is not present we can dereference invalid memory. You can see this here where we unconditionally increment the iterator representing the start of the token sequence and then dereference it.

It looks like there's a simple fix: testing for an empty token sequence and returning early. It may not be the best fix, but it should be a usable workaround. Add this at the start of on_pragma:

    if (begin == end)
        return true;
hkaiser commented 11 months ago

I think another possible fix would be to change this: https://github.com/boostorg/wave/blob/61abf8b6b98d8937515a4eddf6d0d7274921dcda/include/boost/wave/grammars/cpp_grammar.hpp#L541-L550 to use + instead of *. This change would cause it to report an illformed preprocessor directive for the code above.

Also, I should go back to read the standard to see what it has to say about #pragma preprocessor directives.

hkaiser commented 11 months ago

Also, I should go back to read the standard to see what it has to say about #pragma preprocessor directives.

Well, here we go: https://eel.is/c++draft/cpp.pragma. My suggested fix is incorrect, then. A line with just a #pragma directive is perfectly fine. @jefftrull you suggested the correct fix above. Thanks!

jefftrull commented 8 months ago

A fix for this has been merged into master and will be released as part of Boost 1.84

mfep commented 8 months ago

Thank you, @jefftrull!