gap-packages / anupq

The ANUPQ GAP package
https://gap-packages.github.io/anupq/
Artistic License 2.0
4 stars 6 forks source link

Crash in `tst/anupqeg.tst` on OS X #26

Closed fingolfin closed 2 years ago

fingolfin commented 7 years ago

The following code snippet causes the anupq executable to eventually crash on Mac OS X 10.11:

LoadPackage("anupq");
G := ElementaryAbelianGroup( 16 );
procId := PqStart( G );
PqDescendantsTreeCoclassOne( procId : TreeDepth := 15, CapableDescendants );

No such crash on Linux. The issue occurs with both 32 and 64 bit builds. Both with GCC 7.2.0 and clang (specifically: Apple LLVM version 8.0.0 (clang-800.0.42.1))

UPDATE: this also works and is much faster:

LoadPackage("anupq");
G := ElementaryAbelianGroup( 16 );
procId := PqStart( G ); PqDescendantsTreeCoclassOne( procId : TreeDepth := 5, CapableDescendants );
fingolfin commented 7 years ago

Debugging this with lldb is incredibly annoying, because once I attach the debugger, I cannot resume pq, as it'll just die with an error like "Insufficent data read in from or written to file" (at least for interactive pq, I'll now try non-interactive)

fingolfin commented 7 years ago

After fixing some C compiler warning about variables being used uninitialized, here is what I get with SetInfoLevel(InfoANUPQ,4); right before the crash:

#I  5 is defined on [2, 1] = 2 1
#I  Class 3
#I  6 is defined on [5, 1] = 2 1 1
#I  7 is defined on [3, 1] = 3 1
#I  8 is defined on [3, 2] = 3 2
#I  9 is defined on [4, 1] = 4 1
#I  10 is defined on [4, 2] = 4 2
#I  11 is defined on [4, 3] = 4 3
#I  12 is defined on 1^2 = 1 1
#I  13 is defined on 2^2 = 2 2
#I  14 is defined on 3^2 = 3 3
#I  15 is defined on 4^2 = 4 4
#I  Non-trivial powers:
#I   .1^2 = .12
#I   .2^2 = .13
#I   .3^2 = .14
#I   .4^2 = .15
#I   .5^2 = .6
#I  Non-trivial commutators:
#I  [ .2, .1 ] = .5
#I  [ .3, .1 ] = .7
#I  [ .3, .2 ] = .8
#I  [ .4, .1 ] = .9
#I  [ .4, .2 ] = .10
#I  [ .4, .3 ] = .11
#I  [ .5, .1 ] = .6
#I  [ .5, .2 ] = .6
#I  ToPQ> 9  #to (Main) p-Group Generation Menu
#I  ToPQ> 0  #to (Main) p-Quotient Menu
#I  ToPQ> 5  #set output level
#I  ToPQ> 0  #output level
#I  ToPQ> 9  #to (Main) p-Group Generation Menu
#I  ToPQ> 2  #extend automorphisms
#I  ToPQ> 5  #construct descendants
#I  Cannot open (null)
#I  ToPQ> 3 #class bound
#I  ToPQ> 0  #do not construct all descendants
#I  ToPQ> 1  #step size
#I  ToPQ> 0  #do not compute pcgs gen. seq. for auts.
#I  ToPQ> 0  #do not use default algorithm
#I  ToPQ> 0  #rank of initial segment subgrp
#I  ToPQ> 0  #do not completely process terminal descendants
#I  ToPQ> 0  #exponent
#I  ToPQ> 0  #do not enforce metabelian law
#I  ToPQ> 1  #default output
Error, failed to find any more of line (iostream dead?)
 at /Users/mhorn/Projekte/GAP/gap.github/lib/streams.gi:171 called from
ReadAllLine( iostream, true, IS_ALL_PQ_LINE ) at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqios.gi:389 called from
PQ_READ_NEXT_LINE( datarec.stream ) at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqios.gi:469 called from
FILTER_PQ_STREAM_UNTIL_PROMPT( datarec ); at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqios.gi:630 called from
ToPQ( datarec, [ 1 ], [ "  #default output" ] ); at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqi.gi:2978 called from
PQ_PG_CONSTRUCT_DESCENDANTS( datarec ) at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqi.gi:3028 called from
...  at *stdin*:9
you can 'quit;' to quit to outer loop, or
you can 'return;' to continue
brk>

Note the line "#I Cannot open (null)" -- this indicates that OpenFile was called, but the filename string used was a null pointer.

fingolfin commented 7 years ago

With info level 7:

#I  ToPQ> 9  #to (Main) p-Group Generation Menu
#I
#I  Menu for p-Group Generation
#I  -----------------------------
#I  1. Read automorphism information for starting group
#I  2. Extend and display automorphisms
#I  3. Specify input file and group number
#I  4. List group presentation
#I  5. Construct descendants
#I  6. Advanced p-group generation menu
#I  7. Exit to basic menu
#I
#I  Select option:
#I  ToPQ> 2  #extend automorphisms
#I
#I  Select option:
#I  ToPQ> 5  #construct descendants
#I  Cannot open (null)
#I  Input class bound on descendants:
#I  ToPQ> 3 #class bound
#I  Construct all descendants?
#I  ToPQ> 0  #do not construct all descendants
#I  Input step size:
#I  ToPQ> 1  #step size
#I  PAG-generating sequence for automorphism group?
#I  ToPQ> 0  #do not compute pcgs gen. seq. for auts.
#I  Do you want default algorithm?
#I  ToPQ> 0  #do not use default algorithm
#I  Rank of the initial segment subgroup?
#I  ToPQ> 0  #rank of initial segment subgrp
#I  Completely process terminal descendants?
#I  ToPQ> 0  #do not completely process terminal descendants
#I  Input exponent law (0 if none):
#I  ToPQ> 0  #exponent
#I  Enforce metabelian law?
#I  ToPQ> 0  #do not enforce metabelian law
#I  Do you want default output?
#I  ToPQ> 1  #default output
Error, failed to find any more of line (iostream dead?)
 at /Users/mhorn/Projekte/GAP/gap.github/lib/streams.gi:171 called from
ReadAllLine( iostream, true, IS_ALL_PQ_LINE ) at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqios.gi:389 called from
PQ_READ_NEXT_LINE( datarec.stream ) at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqios.gi:469 called from
FILTER_PQ_STREAM_UNTIL_PROMPT( datarec ); at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqios.gi:630 called from
ToPQ( datarec, [ 1 ], [ "  #default output" ] ); at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqi.gi:2978 called from
PQ_PG_CONSTRUCT_DESCENDANTS( datarec ) at /Users/mhorn/Projekte/GAP/repos/pkg/anupq/lib/anupqi.gi:3028 called from
...  at *stdin*:5
you can 'quit;' to quit to outer loop, or
fingolfin commented 5 years ago

Problem persists in OS X 10.14. Crash reporter gives this:

0   libsystem_c.dylib               0x00007fff61619e5f flockfile + 18
1   libsystem_c.dylib               0x00007fff6161ba1f fread + 31
2   pq                              0x000000010581f107 restore_pcp + 55 (read.c:20)
3   pq                              0x0000000105803401 construct + 65 (construct.c:55)
4   pq                              0x0000000105810a89 iteration + 345 (iteration.c:63)
5   pq                              0x0000000105815d72 pgroup_generation + 1378 (pgroup.c:158)
6   pq                              0x00000001058140e6 options + 102 (options.c:145)
7   pq                              0x000000010580fa09 isom_options + 425
8   pq                              0x00000001058286b6 main + 150 (main.c:81)
9   libdyld.dylib                   0x00007fff61592ed9 start + 1
fingolfin commented 5 years ago

That CrashReporter output actually was quite helpful: it crashes because FILE * ifp is NULL. And that in turn is so because in pgroup_generation, also StartName is NULL, leading to StartFile being set to NULL, which ultimately is used as value for ifp inside restore_pcp.

fingolfin commented 2 years ago

Seems I had this pinned down and then stopped, huh..... So next thing to look at: why is it NULL here and not on Linux?

fingolfin commented 2 years ago

I've inserted fprintf(stderr, ...) statements at the start and end of pgroup_generation, and also whenever StartName is modified and used... and that caused the bug to trigger on Linux, too! Output looks like this (excerpt):

...
#I  Number of descendants of group #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 to class 13: 1
RESTORE_GROUP: StartName [grp]_class14
ITERATION: StartName [grp]_class14
ITERATION2: StartName [grp]_class14
#I  Number of descendants of group #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 to class 14: 1
RESTORE_GROUP: StartName [grp]_class15
ITERATION: StartName [grp]_class15
ITERATION2: StartName [grp]_class15
#I  Number of descendants of group #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 #1;1 to class 15: 1
RESTORE_GROUP: StartName [grp]_class2
pgroup_generation END
pgroup_generation START
pgroup_generation END
pgroup_generation START
ITERATION: StartName (null)
Cannot open (null)
Error, failed to find any more of line (iostream dead?)
...

So pgroup_generation is called and StartName is set... and then it exits, starts again, exits, starts again, and StartName is accessed. My guess: this can work if StartName is not initialized (as was the case before commit 6929d52c62570d9326cc226f589cc9acb132c345), because the old value of StartName may still be on the stack, undisturbed (and we never free the storage for StartName either).