Closed cangfengzhs closed 2 years ago
Hi, Copy the definition before including antlr-common.h and put it back afterwards?
I think EOF should be replaced by _EOF and delete undef? After all, EOF is defined by the standard library. Although this problem can be avoided to some extent by modifying the include order of header file, it will be adjusted when using cpplent to format code automatically.
EOF is a reserved identifier (by virtue of being defined by the C standard library). User programs cannot redefine it and Antlr should not generate code which does. And it certainly shouldn't be undefining it in a common header.
I'm fine with such a change if EOF reserved is C++ (not just C). @mike-lischke @jcking have an opinion?
How much will it break if we added EOF to the keywords list so that is generated EOF_
?
@jcking points out:
Basically anywhere EOF is mentioned outside comments, you have to add this before:
#pragma push_macro("EOF")
#undef EOF
And this after:
#pragma pop_macro("EOF")
And then remove the #undef EOF
in the header.
It's reserved in C++ as well:
https://eel.is/c++draft/macro.names https://eel.is/c++draft/cstdio.syn
Tbh. I don't see a need to change that. It has been this way since 2016 and the vast majority of the C++ ANTLR4 user have not had any problem with it. We cannot please everyone.
The issue here, as is often the case with undefined behavior, is fragility. UB can happen-to-work for quite a while until it doesn't and then you're hit with a variety of problems. At least here, it's a compilation failure rather than silent corruption at runtime, but it can happen for a number of reasons including that a transitive dependency changed the order of #include
files. When this happens, there isn't much recourse for users short of omitting mention of EOF
from their grammar (against the "best practices" documentation), not updating a dependency (potentially leaving them open to security vulnerabilities) or hackily adjusting the generated code with sed
or equivalent.
While it's true that you "cannot please everyone" it's also the case that the current situation is a ticking time bomb for your currently-pleased users.
At least with one example we have found that removing a reference to EOF
from the grammar (start rule) allows the generated code to build even if you include cstdio
. To get the same functionality, you can add a quick check after the parse to ensure that the token stream index is the length of the token stream. This is a hack but I'm not sure how to overcome the collision with EOF macro. Just leaving this note here for other travelers.
For the next major release I can see adding EOF to the reserved keywords list and having it convert to EOF_
Or, we can try the #undef
trick.
Thanks everyone!
@shahms I don't think this issue is a potential time bomb. It's not like it just happened to work. Quite the opposite: only very few user used the EOF C macro and knew how to work around the #undef in the ANTLR4 C++ runtime. Most C++ users didn't even see a problem, because it is related to a very narrow area (using C stuff in a C++ application).
@parrt A simpler solution is to change the include order of header files. That usually works well already. But @shahms is right here, it can happen that you cannot change that order. The "correct" solution would be that ANTLR4 doesn't use EOF
as it is a special word in C/C++ (not reserved, but still).
Ok, I've played around with this problem and thought about it. I just can't see breaking all previous ANTLR code that references EOF by renaming it to EOF_
. Yes, we should have avoided EOF at the beginning but I can't change reality; I have to balance correctness with backward compatibility.
The "solution" is to avoid referencing EOF in your actual grammar or to add #undef EOF
in a section to get it into the generate a code in the right spot. Something like this:
@parser::members {
#undef EOF
}
I also can't change the fact that C++ uses include files from the 1960s and that order matters. Oh and that it still has a preprocessor with macros that can completely change the syntax. C++ is over 40 years old now and still accreting features aggressively. Let's see if we can make it to 50! [snark]
Happy to reopen this if we get a better solution!
Removing EOF refs from the grammar seems to work but we still need to verify that the parser is at the end of input when it finishes. I had a thought: Rather than manually checking to see that we are at the end of input after the parser finishes, I wonder if we can simply make a call to parser->match(Token::EOF)
or whatever right after calling the starting parser rule. Assuming of course that we have done a manual #undef EOF
before that ref.
Hey all
Just bumped into this issue when trying to use ANTLR to parse to Boost multiprecision integers:
In file included from /usr/include/boost/multiprecision/cpp_int.hpp:12,
from /home/jod/workspace/exact-dev/src/manyworlds/ManyWorlds.hpp:23,
from /home/jod/workspace/exact-dev/src/manyworlds/Expression.hpp:71,
from /home/jod/workspace/exact-dev/src/manyworlds/parser/Parser.cpp:22:
/usr/include/boost/multiprecision/number.hpp: In function ‘std::istream& boost::operator>>(std::istream&, boost::rational<boost::multiprecision::number<Backend, ExpressionTemplates> >&)’:
/usr/include/boost/multiprecision/number.hpp:2137:12: error: ‘EOF’ was not declared in this scope
2137 | while ((EOF != (c = static_cast<char>(is.peek()))) && (c == 'x' || c == 'X' || c == '-' || c == '+' || (c >= '0' && c <= '9') || (have_hex && (c >= 'a' && c <= 'f')) || (have_hex && (c >= 'A' && c <= 'F'))))
| ^~~
In file included from /usr/include/boost/multiprecision/cpp_int.hpp:12,
from /home/jod/workspace/exact-dev/src/manyworlds/ManyWorlds.hpp:23,
from /home/jod/workspace/exact-dev/src/manyworlds/Expression.hpp:71,
from /home/jod/workspace/exact-dev/src/manyworlds/parser/Parser.cpp:22:
/usr/include/boost/multiprecision/number.hpp:33:1: note: ‘EOF’ is defined in header ‘<cstdio>’; did you forget to ‘#include <cstdio>’?
32 | #include <cctype> // isspace
+++ |+#include <cstdio>
33 | #ifndef BOOST_NO_CXX17_HDR_STRING_VIEW
The grammar did not mention any EOF, so to me the proposed workaround did not work.
For now, I try to ensure that the Antlr includes are processed after the multiprecision includes, which is quite a pain, as the autoformatter likes to reorder includes, and as I do not have a good idea how exactly the compiler orders the includes in C++.
So to me, this is actually an issue that makes me think about not using Antlr. Are there any other workarounds?
This is also a key reason why I don't consider using Antlr
@JoD interesting that EOF was automatically added even if it's not in the grammar... Can you post the grammar? Often people get around things by the order of includes.
@cangfengzhs I totally understand. We are in a difficult situation because they see plus plus model was created long before we started thinking carefully about the cross platform reserved words collision problem. There is so much C++ code out there that I hate to break all of that to tweak the antlr runtime for EOF.
I just hit this problem just by including boost::iostreams
on a brand new project. Whilst I can see that you don't want to break existing code, this is also preventing writing new one. What about guarding your #undef
with some #ifdef ANTLR4CPP_LEGACY_EOF
and old users can just -DANTLR4CPP_LEGACY_EOF
in their build scripts? or maybe some options { EOF=EOF_; }
in the grammar so that new code can "rename away from EOF"?
I also encountered the same problem. In this modification on my forked version, I tried to remove #undef EOF
from the Cpp-runtime, but I don't know how to modify the demo's g4 file https://github.com/antlr/antlr4/blob/dev/runtime/Cpp/demo/TParser.g4 to generate a correct parser.
The error message is as follows:
Consolidate compiler generated dependencies of target antlr4_shared
[ 47%] Built target antlr4_shared
Consolidate compiler generated dependencies of target antlr4_static
[ 93%] Built target antlr4_static
[ 94%] Built target gtest
[ 95%] Built target gtest_main
[ 96%] Built target antlr4_tests
[ 96%] Built target gmock
[ 97%] Built target gmock_main
[ 97%] Built target GenerateParser
Consolidate compiler generated dependencies of target antlr4-demo
[ 98%] Building CXX object demo/CMakeFiles/antlr4-demo.dir/Linux/main.cpp.o
In file included from /usr/include/c++/11/cstdio:42,
from /usr/include/c++/11/ext/string_conversions.h:43,
from /usr/include/c++/11/bits/basic_string.h:6608,
from /usr/include/c++/11/string:55,
from /usr/include/c++/11/bits/locale_classes.h:40,
from /usr/include/c++/11/bits/ios_base.h:41,
from /usr/include/c++/11/ios:42,
from /usr/include/c++/11/ostream:38,
from /usr/include/c++/11/iostream:39,
from /home/wt/antlr4/runtime/Cpp/demo/Linux/main.cpp:13:
/home/wt/antlr4/runtime/Cpp/demo/generated/TParser.h:82:33: error: expected unqualified-id before ‘-’ token
82 | antlr4::tree::TerminalNode *EOF();
| ^~~
/home/wt/antlr4/runtime/Cpp/demo/generated/TParser.h:82:33: error: expected ‘)’ before ‘-’ token
82 | antlr4::tree::TerminalNode *EOF();
| ^~~
I need some guidance.
@JoD interesting that EOF was automatically added even if it's not in the grammar... Can you post the grammar? Often people get around things by the order of includes.
The grammar is here: https://gitlab.com/nonfiction-software/manyworlds/-/blob/main/src/antlr/manyworlds.g4?ref_type=heads Having a look, it does use EOF, so I might be mistaken there.
Indeed, manually fixing the order of includes was a feasible workaround. Feels brittle and had to figure out how to forbid the code formatter from touching these includes, but works for now.
I use another lib(brpc) which use macro EOF, but anltr(4.8) undef it in antlr-common.h. In order to use them at the same time, I need to strictly control the order of the include, otherwise the compilation will fail. How to solve this problem?