Open mahrud opened 3 years ago
I think I figured some version of this out, although I have a slightly different crash. For some reason there's a definition of the class poly
in both e/f4/f4-types.hpp
and e/schreyer-resolution/res-poly-ring.hpp
. It seems c++ doesn't allow this in general, but in some cases we get lucky and the linker finds the right functions to call.
good catch!
As a remark, I think we should start putting stuff in namespaces. Obviously that's not a quick fix, but if f4-types.hpp and res-poly-ring.hpp were in different namespaces, we wouldn't have seen this issue. Also, just grepping around there are a few other instances of duplicate class names. We have two instances of poly
, and one instance of POLY
, two instances of poly_struct
, two instances of spair_sorter
as well as class names like tableau
and tableau2
.
For things that are local to a file, we should really be putting them in unnamed namespaces https://en.cppreference.com/w/cpp/language/namespace#Unnamed_namespaces
I think I figured some version of this out, although I have a slightly different crash. For some reason there's a definition of the class
poly
in bothe/f4/f4-types.hpp
ande/schreyer-resolution/res-poly-ring.hpp
. It seems c++ doesn't allow this in general, but in some cases we get lucky and the linker finds the right functions to call.
I don't see how this can be a problem, without the linker giving a duplicate symbol definition error.
I think I figured some version of this out, although I have a slightly different crash. For some reason there's a definition of the class
poly
in bothe/f4/f4-types.hpp
ande/schreyer-resolution/res-poly-ring.hpp
. It seems c++ doesn't allow this in general, but in some cases we get lucky and the linker finds the right functions to call.I don't see how this can be a problem, without the linker giving a duplicate symbol definition error.
You're allowed to define something twice in c++ (think inline member functions in headers) so long as the definitions are exactly identical. In theory the code would work if the two instances of poly
were exactly identical. The linker isn't going to go through the effort to figure out if your things are identical.
I think the linker checks the length, at least, which is almost certainly going to differ. Have you actually found any duplicate symbols? If so, which ones?
I found some:
gallium$ nm -o f4/res-f4.o schreyer-resolution/res-f4.o | grep ' T .*poly'
f4/res-f4.o: 0000000000002db0 T __ZN16poly_constructor12pushBackTermEPi
f4/res-f4.o: 0000000000002c80 T __ZN16poly_constructor15appendMonicTermEPi
f4/res-f4.o: 0000000000002ed0 T __ZN16poly_constructor7setPolyER4poly
schreyer-resolution/res-f4.o: 0000000000002eb0 T __ZN16poly_constructor12pushBackTermEPi
schreyer-resolution/res-f4.o: 0000000000002d80 T __ZN16poly_constructor15appendMonicTermEPi
schreyer-resolution/res-f4.o: 0000000000002fd0 T __ZN16poly_constructor7setPolyER4poly
Do their contents differ?
So the symptoms I got was that I was debugging in schreyer-resolution/res-f4-computation.cpp
, and stepping into the constructor for poly, I ended up in f4/f4-types.hpp
instead of schreyer-resolution/res-poly-ring.hpp
.
In this case, the two classes have different contents, and thus different constructors, leading to chaos.
I should be the one to fix this.
I'm surprised we haven't run into this problem in the past. This is a serious bug, I think, so much so, that it should never have run. Perhaps in optimized mode it is being inlined, so it is not an issue in that case? But I run all the tests in debug mode too, and haven't found an issue either...
I think for a quick fix I will rename some of these classes. @DanGrayson What branch should I make these changes off of? This is not for 1.18, right? So development...?
@jkyang92 I agree that name spaces should be used throughout and consistently. Suggestions? Right now there is very little namespace use, and what is being used isn't so awesome. If you or @mahrud think about this, I want also to be part of the discussion about what is the best way to do this.
I'm not an expert on this stuff, but I'd put everything at least into a M2 (or M2Engine) namespace. Then subdirectories should probably get a sub-namespace. The major exception is interface, where it seems the headers are C compatible, so we can't really namespace that. And then any file local classes in .cpp files should probably go in a unnamed namespace.
I think for a quick fix I will rename some of these classes. @DanGrayson What branch should I make these changes off of? This is not for 1.18, right? So development...?
Yes, release-1.18-branch
is just for building 1.18.
@jkyang92 @mahrud I am working on fixing these naming issues.
I found some:
gallium$ nm -o f4/res-f4.o schreyer-resolution/res-f4.o | grep ' T .*poly' f4/res-f4.o: 0000000000002db0 T __ZN16poly_constructor12pushBackTermEPi f4/res-f4.o: 0000000000002c80 T __ZN16poly_constructor15appendMonicTermEPi f4/res-f4.o: 0000000000002ed0 T __ZN16poly_constructor7setPolyER4poly schreyer-resolution/res-f4.o: 0000000000002eb0 T __ZN16poly_constructor12pushBackTermEPi schreyer-resolution/res-f4.o: 0000000000002d80 T __ZN16poly_constructor15appendMonicTermEPi schreyer-resolution/res-f4.o: 0000000000002fd0 T __ZN16poly_constructor7setPolyER4poly
Do their contents differ?
@DanGrayson I think you have stale .o files around: those files were moved from f4/ to schreyer-resolution/
R = ZZ/3[x]; minimalBetti ideal x
@mahrud Which linux and compiler did this crash occur on? I can't recreate it, but I am on a mission to track down all of these spurious crashes, like this one, and the monomial ideal one!
Fedora 32, kernel 5.10.17 if that matters, clang 11.1, compiled with cmake on a Debug build.
It also happened here, on Ubuntu 20.04.2 with brew's GNU 11.1.0, compiled with cmake on a Release build.
Unfortunately, it's still crashing for me. The stack trace is slightly different though:
```m2
i1 : R = ZZ/3[x]; minimalBetti ideal x
-- SIGSEGV
-* stack trace, pid: 2374755
0# stack_trace(std::ostream&, bool) at ../../Macaulay2/bin/main.cpp:124
1# segv_handler at ../../Macaulay2/bin/main.cpp:240
2# 0x00007FBBD1DA4860 in /lib64/libc.so.6
3# void __gnu_cxx::new_allocator<void const*>::construct<void const*, void const*>(void const**, void const*&&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/ext/new_allocator.h:150
4# void std::allocator_traits<std::allocator<void const*> >::construct<void const*, void const*>(std::allocator<void const*>&, void const**, void const*&&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/alloc_traits.h:516
5# void const*& std::vector<void const*, std::allocator<void const*> >::emplace_back<void const*>(void const*&&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/vector.tcc:115
6# std::vector<void const*, std::allocator<void const*> >::push_back(void const*&&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/stl_vector.h:1204
7# memt::Arena::alloc(unsigned long) at /home/linuxbrew/.linuxbrew/opt/memtailor/include/memtailor/Arena.h:299
8# std::pair<int*, int*> memt::Arena::allocArrayNoCon<int>(unsigned long) at /home/linuxbrew/.linuxbrew/opt/memtailor/include/memtailor/Arena.h:338
9# ResMonomialSorter::setMonoms() at ../../Macaulay2/e/schreyer-resolution/res-monomial-sorter.hpp:93
10# ResMonomialSorter::ordered() at ../../Macaulay2/e/schreyer-resolution/res-monomial-sorter.hpp:101
11# check_poly(ResPolyRing const&, ResPolynomial const&, ResSchreyerOrder const&) at ../../Macaulay2/e/schreyer-resolution/res-poly-ring.cpp:72
12# SchreyerFrame::insertLevelOne(int*, int, ResPolynomial&) at ../../Macaulay2/e/schreyer-resolution/res-schreyer-frame.cpp:601
13# createF4Res(Matrix const*, int, int) at ../../Macaulay2/e/schreyer-resolution/res-f4-computation.cpp:161
14# ResolutionComputation::choose_res(Matrix const*, char, int, char, int, int, int) at ../../Macaulay2/e/comp-res.cpp:136
15# IM2_res_make at ../../Macaulay2/e/interface/groebner.cpp:122
16# interface_rawResolution at /home/mahrud/Projects/M2/M2/M2/Macaulay2/d/interface.dd:3604
17# evaluate_evalraw at /home/mahrud/Projects/M2/M2/M2/Macaulay2/d/evaluate.d:1297
Two things:
check_poly
in schreyer-resolution/res-poly-ring.cpp
, the first one is commented. If I switch the algorithms, there's no crash.
i4 : R = ZZ/3[x]; print minimalBetti ideal x
../../Macaulay2/m2/expressions.m2:641:47:(1):[25]: error: input polynomials/vectors were computed in a non-compatible monomial order
../../Macaulay2/m2/expressions.m2:641:47:(1):[25]: --entering debugger (type help to see debugger commands)
../../Macaulay2/m2/expressions.m2:641:47-641:88: --source code:
expressionValue FunctionApplication := (m) -> (expressionValue m#0) (expressionValue m#1)
ii6 : end ../../Macaulay2/m2/res.m2:417:28:(1):[23]: --entering debugger (type help to see debugger commands) ../../Macaulay2/m2/res.m2:417:28-417:34: --source code: W.RawComputation = value log;
thanks @mahrud ! that sounds helpful, and might help me track down the problem. But still, it is computing a GB of ideal(x)....!
I think the error in (2) above is irrelevant, but kind of curious. What is ResMonomialSorter::ordered
supposed to do? I had added a cout
line with S.ordered()
and apparently doing that twice causes that error. Is it supposed to change something?
@mahrud I have some questions. I was able to reproduce this bug on min-betti-crash branch, in debug mode, on ubuntu 21.04.
The class for an "Arena" has extra debug fields in it, in debug mode. I am suspecting that these are not in the actual Arena class, as it was compiled with optimization. But if we are including the include file after setting MEMT_DEBUG, this would be a problem... We are not using Arena's in too many places yet, but minimalBetti is one of them.
Anyway, your answers will let me know if my hypothesis is wrong ...!
2021-06-02T18:39:35.6701356Z /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/linux/super/g++-11 -I/tmp/memtailor-20210602-6310-1rq0iy9/src -O3 -DNDEBUG -DPACKAGE_NAME=\"memtailor\" -DPACKAGE_TARNAME=\"memtailor\" -DPACKAGE_VERSION=\"1.0\" "-DPACKAGE_STRING=\"memtailor 1.0\"" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"memtailor\" -DVERSION=\"1.0\" -std=gnu++14 -MD -MT CMakeFiles/memtailor.dir/src/memtailor/MemoryBlocks.cpp.o -MF CMakeFiles/memtailor.dir/src/memtailor/MemoryBlocks.cpp.o.d -o CMakeFiles/memtailor.dir/src/memtailor/MemoryBlocks.cpp.o -c /tmp/memtailor-20210602-6310-1rq0iy9/src/memtailor/MemoryBlocks.cpp
@mahrud So if we are not building memtailor ourselves (in optimization mode), then setting MEMT_DEBUG in M2 before including files from memtailor, seems to be an error. Do you agree?
Oh, I see what you mean, I thought only memtailor's CMakeFile was setting MEMT_DEBUG, but it is being set by M2 as well: https://github.com/Macaulay2/M2/blob/b5b9b18ccb0dbf4daeeefc69290f6f18331134ef/M2/cmake/configure.cmake#L208-L211 I'll try without this ...
I'm actually not sure how we should handle this. Whether we set MEMT_DEBUG seems to depend on whether we are building memtailor in debug mode or not.
@mahrud @DanGrayson what do you think is a good way to handle this?
If I make the changes to refactor mathic, memtailor, mathicgb into e, these issues go away :). (I have a branch already to submit for this too).
Oh, I see what you mean, I thought only memtailor's CMakeFile was setting MEMT_DEBUG, but it is being set by M2 as well: https://github.com/Macaulay2/M2/blob/b5b9b18ccb0dbf4daeeefc69290f6f18331134ef/M2/cmake/configure.cmake#L208-L211
I'll try without this ...
The trouble is, in M2 we need to know whether MEMT_DEBUG was set or not when compiling memtailor (same will probably be true of the other 2 libraries as well).
The crash went away when I recompiled M2 without setting MEMT_DEBUG, but this doesn't explain the crash on github brew builds ... maybe that was really Jay's crash and not mine? I'll re-try build the bottles from your branch.
I'm actually not sure how we should handle this. Whether we set MEMT_DEBUG seems to depend on whether we are building memtailor in debug mode or not.
Perhaps we should force debug builds to build those three libraries. Alternatively, we could change how MEMT_DEBUG affects Arena in this way. In particular, are there things like this for Mathic and Mathicgb as well?
If I make the changes to refactor mathic, memtailor, mathicgb into e, these issues go away :). (I have a branch already to submit for this too).
Do you specifically want the directory structure to look like, for instance Macaulay2/e/memtailor/src/memtailor/Arena.cpp
? If so, then moving the location of those three submodules would make sense. CMake can also import other CMake projects as a subproject (rather than an external project, which it currently does), which would make handling this simple. I can make this change on your branch before or after you make a PR if you'd like.
Alternatively, you could add a relative symlink so for instance Macaulay2/e/memtailor/Arena.cpp
works.
We'll see how https://github.com/Macaulay2/homebrew-tap/pull/80 goes.
I'm actually not sure how we should handle this. Whether we set MEMT_DEBUG seems to depend on whether we are building memtailor in debug mode or not.
@mahrud @DanGrayson what do you think is a good way to handle this?
I'm not sure I understand the issue, but whether MEMT_DEBUG is defined should affect memtailor only, and only be done when memtailor is being built.
I'm not sure I understand the issue, but whether MEMT_DEBUG is defined should affect memtailor only, and only be done when memtailor is being built.
Why is that? The trouble is that in a header file (in memtailor), a struct changes size (it has an extra set of fields) if MEMT_DEBUG is set. If it is compiled with MEMT_DEBUG not set, but then we use the header file with it set, then that is a problem.
Because the information is flowing the wrong direction -- users of a library shouldn't need to know the options that were used when compiling the library. Instead, the library should record the needed info in the installed include files, for the convenience of the user. Add a new include file called memtailor_options.h
that contains one of the following two lines:
#define MEMT_DEBUG 1
#undef MEMT_DEBUG
The debug flag issue still needs to be fixed.
This crash is occurring on the github workflow that builds brew bottles and is blocking it. The crash itself is in the minimalBetti example but I could also reproduce it myself on Linux:
cc: @mikestillman seems to be related to the changes in
e/schreyer-resolution
.