dankamongmen / notcurses

blingful character graphics/TUI library. definitely not curses.
https://nick-black.com/dankwiki/index.php/Notcurses
Other
3.64k stars 115 forks source link

All alpine autobuilders fail in tests using 2.0.11 #1197

Closed dankamongmen closed 3 years ago

dankamongmen commented 3 years ago

I cut 2.0.11 for Alpine Edge today, confident that we'd fixed the s390x problem there. The good news is that s390x no longer errors out differently from the others. The bad news is that all now fail :(. Gotta fix this before 2.1.0.

dankamongmen commented 3 years ago

It looks like 2.0.12-pre is also now failing on drone :(. Though this is since 2.0.11. 4935 worked; since 4936, we're dead in the water.

dankamongmen commented 3 years ago

I've been trying to reproduce this locally, and failing. Need to get a core file exfiltrated from the docker.

dankamongmen commented 3 years ago

Finally got it reproduced!

  RefreshSameSizeternal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)0)
ncreel_redraw:672:Error drawing tablet4x80 plane "std" @ 0x0 1x1 from 0/0)
/root/notcurses/tests/notcurses.cpp:40: ERROR: CHECK( newx == x ) is NOT correct!creel_redraw:672:Error drawing tabletdard plane9fd9e60 @ 2x2ess)
  values: CHECK( 1 == 80 )60389fbf850 != 0x560389fd9e60r address)8 from 0/0)
get_tty_fd:920:File descriptor 1 was not a TTYls0/0 (keeping 1x1 from 0/0)
/root/notcurses/tests/notcurses.cpp:41: ERROR: CHECK( newy == y ) is NOT correct!x560389fbf850 is already registered for signals "std" @ 0x000000003
  values: CHECK( 1 == 24 )t is not to a terminal)"tab" @ 1x1 1x1 from 0/0)
ncplane_new_internal:419:Created new 24x80 plane "std" @ 0x02ess)from 0/0)
===============================================================================
/root/notcurses/tests/piles.cpp:3:as not a TTY@ 0/0 (keeping 1x1 from 0/0)
TEST CASE:  Pilesror opening /dev/tty (No such device or address)
  SmallerPileRenderlready registered for signals9fe55d0eping 1x1 from 0/0)
Defaulting to 24x80 (output is not to a terminal)ab" @ 0x0
/root/notcurses/tests/piles.cpp:32: FATAL ERROR: REQUIRE( nullptr != egc ) is NOT correct!w_internal:419:Created new 24x80 plane "" @ 0x0ing 1x1 from 0/0)
  values: REQUIRE( NULL != NULL ) @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
ncplane_destroy:677:Won't destroy standard plane9fd9e60r address)
===============================================================================
/root/notcurses/tests/reel.cpp:120:s not a TTYal)ab" @ 0x0ddress)
TEST CASE:  Reelsror opening /dev/tty (No such device or address)
  ThreeCycleDownnternal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
ncplane_destroy:677:Won't destroy standard plane "std" @ 0x0
/root/notcurses/tests/reel.cpp:335: ERROR: CHECK_LE( 0, order[n] ) is NOT correct!t_tty_fd:920:File descriptor 1 was not a TTY@ 0/0 (keeping 1x1 from 0/0)
  values: CHECK_LE( 0, -1 )g /dev/tty (No such device or address)
0x560389fbf850 is already registered for signals
/root/notcurses/tests/reel.cpp:120: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signalegistered for signals
Defaulting to 24x80 (output is not to a terminal)
===============================================================================
/root/notcurses/tests/reel.cpp:120:w 1x16 plane "plot" @ 1x1
TEST CASE:  Reelsternal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
Couldn't drop signals: 0x560389fbf850 != 0x560389fd9e60
DEEPEST SUBCASE STACK REACHED (DIFFERENT FROM THE CURRENT ONE):
  ThreeCycleDownrror opening /dev/tty (No such device or address)
0x560389fbf850 is already registered for signals
===============================================================================
[doctest] test cases:     32 |     27 passed |      5 failed |     10 skipped
[doctest] assertions: 8409076 | 8409054 passed |     22 failed |
[doctest] Status: FAILURE!
[vps](139) $ 
dankamongmen commented 3 years ago
/root/notcurses/tests/fds.cpp:49:
DESCRIPTION: Fdplanes and subprocedures
TEST CASE:  FdsAndSubprocs
  SubprocDestroyCmdHung

/root/notcurses/tests/fds.cpp:156: WARNING: WARN( 0 != ncsubproc_destroy(ncsubp) ) is NOT correct!
  values: WARN( 0 != 0 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[
/root/notcurses/tests/fills.cpp:5:
TEST CASE:  Fills
  Ncplane_Stain

/root/notcurses/tests/fills.cpp:272: FATAL ERROR: REQUIRE( 0 < ncplane_stain(n_, 7, 7, channels, channels, channels, channels) ) is NOT correct!
  values: REQUIRE( 0 <  -1 )

^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[
/root/notcurses/tests/ncplane.cpp:20:
TEST CASE:  NCPlane
  PlaneAtCursorAttrs

/root/notcurses/tests/ncplane.cpp:638: ERROR: CHECK( newx == x ) is NOT correct!
  values: CHECK( 0 == 42 )

/root/notcurses/tests/ncplane.cpp:640: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, y - 2, x - 1) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:642: ERROR: CHECK( testcell.gcluster == (__bswap_32 (__bswap_32 (STR1[strlen(STR1) - 1]))) ) is NOT correct!
  values: CHECK( 116 == 110 )

/root/notcurses/tests/ncplane.cpp:643: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, y - 1, x - 1) ) is NOT correct^[[1;1H^[[31m^[[46m╔^[[1;1H^[[31m^[[46m╭^[[1;1H^[[39;49mI^[[1;1H^[[39;49m╭^[[1;1H ^[[1;1H^[[39;49mX^[[1;1H^[[39;49m^[[30mA^[[1;1H^[[39;49m^[[30mC!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:645: ERROR: CHECK( testcell.gcluster == (__bswap_32 (__bswap_32 (STR2[strlen(STR2) - 1]))) ) is NOT correct!
  values: CHECK( 116 == 107 )

/root/notcurses/tests/ncplane.cpp:646: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, y, x - 1) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:648: ERROR: CHECK( testcell.gcluster == (__bswap_32 (__bswap_32 (STR3[strlen(STR3) - 1]))) ) is NOT correct!
  values: CHECK( 116 == 115 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m===============================================================================
/root/notcurses/tests/ncplane.cpp:20:
TEST CASE:  NCPlane
  RightToLeft

/root/notcurses/tests/ncplane.cpp:733: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, 3, 10) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:734: ERROR: CHECK( 0 < ncplane_putstr(n_, "I can write English with מילים בעברית in the same sentence.") ) is NOT correct!
  values: CHECK( 0 <  -1 )

/root/notcurses/tests/ncplane.cpp:735: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, 5, 10) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:736: ERROR: CHECK( 0 < ncplane_putstr(n_, "|🔥|I have not yet ־ begun to hack|🔥|") ) is NOT correct!
  values: CHECK( 0 <  0 )

/root/notcurses/tests/ncplane.cpp:737: ERROR: CHECK( 0 == ncplane_cursor_move_yx(n_, 7, 10) ) is NOT correct!
  values: CHECK( 0 == -1 )

/root/notcurses/tests/ncplane.cpp:738: ERROR: CHECK( 0 < ncplane_putstr(n_, "㉀㉁㉂㉃㉄㉅㉆㉇㉈㉉㉊㉋㉌㉍㉎㉏㉐㉑㉒㉓㉔㉕㉖㉗㉘㉙㉚㉛㉜㉝㉞㉟") ) is NOT correct!
  values: CHECK( 0 <  0 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m===============================================================================
/root/notcurses/tests/ncplane.cpp:20:
TEST CASE:  NCPlane
  EGCStained

/root/notcurses/tests/ncplane.cpp:806: ERROR: CHECK( 1 == ncplane_putegc_stained(n_, "D", &sbytes) ) is NOT correct!
  values: CHECK( 1 == -1 )

/root/notcurses/tests/ncplane.cpp:813: ERROR: CHECK( 1 == ncplane_at_yx_cell(n_, 0, 1, &c) ) is NOT correct!
  values: CHECK( 1 == -1 )

/root/notcurses/tests/ncplane.cpp:815: ERROR: CHECK( (__bswap_32 (__bswap_32 ('D'))) == c.gcluster ) is NOT correct!
  values: CHECK( 68 == 67 )

/root/notcurses/tests/ncplane.cpp:817: ERROR: CHECK( channels == c.channels ) is NOT correct!
  values: CHECK( 4650116732956966912 == 4630901375692177408 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[
^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[H^[[39;49m ^[[1;1H^[[332m^[[464mX^[[1;1H^[[39;49mO^[[1;1H^[[35m^[[40m▒^[[1;1H^[[39;49m ^[[1;1H^[[37m^[[40m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H^[[39;49m╭^[[1;1H╰^[[1;1H^[[39;49m╭^[[1;1H╰^[[1;1H^[[39;49m╭^[[1;1H^[[39
/root/notcurses/tests/notcurses.cpp:7:
TEST CASE:  NotcursesBase
  RefreshSameSize

/root/notcurses/tests/notcurses.cpp:40: ERROR: CHECK( newx == x ) is NOT correct!
  values: CHECK( 1 == 80 )

/root/notcurses/tests/notcurses.cpp:41: ERROR: CHECK( newy == y ) is NOT correct!
  values: CHECK( 1 == 24 )

^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[
/root/notcurses/tests/piles.cpp:3:
TEST CASE:  Piles
  SmallerPileRender

/root/notcurses/tests/piles.cpp:32: FATAL ERROR: REQUIRE( nullptr != egc ) is NOT correct!
  values: REQUIRE( NULL != NULL )

^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[m^[[?1h^[=^[[39;49m^[(B^[[m^[[39;49m^[(B^[[
/root/notcurses/tests/reel.cpp:120:
TEST CASE:  Reels
  ThreeCycleDown

/root/notcurses/tests/reel.cpp:335: ERROR: CHECK_LE( 0, order[n] ) is NOT correct!
  values: CHECK_LE( 0, -1 )

/root/notcurses/tests/reel.cpp:120: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

===============================================================================
/root/notcurses/tests/reel.cpp:120:
TEST CASE:  Reels

DEEPEST SUBCASE STACK REACHED (DIFFERENT FROM THE CURRENT ONE):
  ThreeCycleDown

===============================================================================
[doctest] test cases:     32 |     27 passed |      5 failed |     10 skipped
[doctest] assertions: 8409076 | 8409054 passed |     22 failed |
[doctest] Status: FAILURE!
dankamongmen commented 3 years ago

OK, I can reproduce this locally now just by running in nohup and disconnecting the terminal. Yeaargh.

dankamongmen commented 3 years ago

I see what's happening, though I have no idea why:

ncplane_cursor_move_yx:490:Target y 26 >= height 24
CURRENT: 24/80 TERM: 1/1
ncplane_resize_internal:586:24x80 @ 0/0 → 1/1 @ 0/0 (keeping 1x1 from 0/0)
****************** 0/0
ncplane_cursor_move_yx:479:Target x 79 >= length 1
ncplane_cursor_move_yx:479:Target x 79 >= length 1
ncplane_cursor_move_yx:479:Target x 79 >= length 1

in PlaneAtCursorAttrs, we're somehow dropping the standard plane down to {1, 1} dimensions, at which point we can't emit our strings, and everything goes to hell. why would we be going to 1,1?

dankamongmen commented 3 years ago

PlaneAtCursorAttrs is resolved now, but i suspect others are broken in the same way. it looks like we possibly carry cursor information across subtests? but wouldn't that mean we carry all aspects of the standard plane across subtests? i don't think that that's going on.....hrmm....

dankamongmen commented 3 years ago

Hrmmm, I don't like that segfault, but we have this resolved and tests are now passing. I'd like to try further to reproduce and chase down that segfault, though.

dankamongmen commented 3 years ago

the segfault was a failure to check a result in the reels tests. resolved. we're done here!

dankamongmen commented 3 years ago

https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/15708 we look good! all alpine builds are now passing =].

kaniini commented 3 years ago

2.0.12 tests are crashing on x86 :(

dankamongmen commented 3 years ago

2.0.12 tests are crashing on x86 :(

so i saw. =[ do you have any insight as to why they would all be green in the pipeline attached to the PR, but then one would break somewhere further down the line? the build logs for that pipeline clearly show the tests being run on x86, and succeeding. =[

dankamongmen commented 3 years ago

2.0.12 tests are crashing on x86 :(

btw is your "keeper of mazes" a reference to the Ariadne of mythology, she of the gold thread?

kaniini commented 3 years ago

the CI environment is a bit different than the actual buildservers. on the buildservers, we capture stdout and stderr file descriptors and redirect them to files. i do not believe we do this on CI.

kaniini commented 3 years ago

i had to block notcurses on x86 so that our x86 buildserver would move onto trying to build other packages, but would be happy to help debug this.

btw is your "keeper of mazes" a reference to the Ariadne of mythology, she of the gold thread?

yes, I work on a lot of security-related code inside and outside alpine, as well. seemed like a good fit.

kaniini commented 3 years ago

I have set up an x86 alpine install and built notcurses in it manually with abuild:

$ abuild deps clean unpack prepare build
[...]
[100%] Linking CXX executable notcurses-tester
make[2]: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
[100%] Built target notcurses-tester
make[1]: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
make: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'

I then ran abuild check to invoke the testsuite by hand:

$ abuild check
make: Entering directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
Running tests...
Test project /home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build
    Start 1: notcurses-tester
1/7 Test #1: notcurses-tester .................   Passed   23.87 sec
    Start 2: ncpp_build
2/7 Test #2: ncpp_build .......................   Passed    0.01 sec
    Start 3: ncpp_build_exceptions
3/7 Test #3: ncpp_build_exceptions ............   Passed    0.01 sec
    Start 4: sgr-full
4/7 Test #4: sgr-full .........................   Passed    0.01 sec
    Start 5: sgr-direct
5/7 Test #5: sgr-direct .......................   Passed    0.01 sec
    Start 6: rgb
6/7 Test #6: rgb ..............................   Passed    0.02 sec
    Start 7: rgbbg
7/7 Test #7: rgbbg ............................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 7

Total Test time (real) =  24.08 sec
make: Leaving directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'

I then ran the testsuite with stdout and stderr captured like the buildservers do:

$ (abuild check >check.log 2>&1) & tail -f check.log
make: Entering directory '/home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build'
Running tests...
Test project /home/kaniini/aports/community/notcurses/src/notcurses-2.0.12/build
    Start 1: notcurses-tester

In my test environment, its just frozen, which is a behavior different than the buildserver even though the FDs are captured the exact same way.

kaniini commented 3 years ago

Running this a few more times, I cannot get it to crash and I cannot get it to segfault on a bare x86 VM.

kaniini commented 3 years ago

Running the tester program directly also does not crash. I'm honestly baffled as to why the buildserver is failing reliably, but the failure is not reproducible in a test environment.

kaniini commented 3 years ago

Hmm, when your code sees that stdout is not a TTY, it tries to open /dev/tty. This is probably related -- I doubt /dev/tty is connected to anything useful in an LXC container, but more importantly it probably shouldn't try to use /dev/tty anyway.

kaniini commented 3 years ago

Bam! If I replace /dev/tty with something bogus (say, a FIFO), we get an immediate segfault in notcurses-tester.

kaniini commented 3 years ago

Opened #1212 with a likely fix.

dankamongmen commented 3 years ago

i had to block notcurses on x86 so that our x86 buildserver would move onto trying to build other packages, but would be happy to help debug this.

btw is your "keeper of mazes" a reference to the Ariadne of mythology, she of the gold thread?

yes, I work on a lot of security-related code inside and outside alpine, as well. seemed like a good fit.

awesome. i'm at work at the moment, but will be able to look at this again this evening. so distressing -- i figured out why we were breaking on s390x, only to start breaking on x86! i definitely intend to get this fixed, just didn't yet have the heart to do so the other day =].

dankamongmen commented 3 years ago

Marking this tentatively closed, in the hope that @kaniini 's patch fixes us up. I might go ahead and package Alpine 2.1.0 (they're on 2.0.12 currently) with @kaniini 's patch in the APKBUILD, and that way bring them up to speed while also getting a test prior to 2.1.1. I'd really love to stop breaking their autobuilder.

kaniini commented 3 years ago

Unfortunately on the latest try, it doesn't fix us up. With the help of @ikke, I was able to get the CTest log from the builder:

===============================================================================
/home/buildozer/aports/community/notcurses/src/notcurses-2.0.12/tests/piles.cpp:3:
TEST CASE:  Piles
  ShufflePile

/home/buildozer/aports/community/notcurses/src/notcurses-2.0.12/tests/piles.cpp:3: FATAL ERROR: test case CRASHED: SIGSEGV - Segmentation violation signal

Going to dig a little bit into this test.

dankamongmen commented 3 years ago

Hot damn, y'all are CHAMPIONS. I'm on it as well. The piles stuff is all new from 2.0.x, and I haven't yet written anything that really exercises it, so I'm not surprised to see potential problems there.

dankamongmen commented 3 years ago

tally-ho!

[schwarzgerat](0) $ cat e
==1138495== Memcheck, a memory error detector
==1138495== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1138495== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==1138495== Command: ./notcurses-tester -p ../data/ --tc=Piles
==1138495== 
==1138495== Invalid write of size 8
==1138495==    at 0x4A6FC6E: ncplane_destroy (notcurses.c:699)
==1138495==    by 0x4A6FC6E: ncplane_destroy (notcurses.c:680)
==1138495==    by 0x1C517A: _DOCTEST_ANON_FUNC_2() (piles.cpp:129)
==1138495==    by 0x190A95: doctest::Context::run() (doctest.h:6167)
==1138495==    by 0x1458DB: main (main.cpp:133)
==1138495==  Address 0xf294aa0 is 96 bytes inside a block of size 176 free'd
==1138495==    at 0x48399AB: free (vg_replace_malloc.c:538)
==1138495==    by 0x4A6FCB3: ncplane_destroy (notcurses.c:712)
==1138495==    by 0x4A6FCB3: ncplane_destroy (notcurses.c:680)
==1138495==    by 0x1C5162: _DOCTEST_ANON_FUNC_2() (piles.cpp:127)
==1138495==    by 0x190A95: doctest::Context::run() (doctest.h:6167)
==1138495==    by 0x1458DB: main (main.cpp:133)
==1138495==  Block was alloc'd at
==1138495==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==1138495==    by 0x4A6EDDF: ncplane_new_internal (notcurses.c:358)
==1138495==    by 0x1C3E95: _DOCTEST_ANON_FUNC_2() (piles.cpp:99)
==1138495==    by 0x190A95: doctest::Context::run() (doctest.h:6167)
==1138495==    by 0x1458DB: main (main.cpp:133)
==1138495== 
==1138495== 
==1138495== HEAP SUMMARY:
==1138495==     in use at exit: 58,182 bytes in 264 blocks
==1138495==   total heap usage: 3,175 allocs, 2,911 frees, 3,427,641 bytes allocated
==1138495== 
==1138495== LEAK SUMMARY:
==1138495==    definitely lost: 0 bytes in 0 blocks
==1138495==    indirectly lost: 0 bytes in 0 blocks
==1138495==      possibly lost: 1,352 bytes in 18 blocks
==1138495==    still reachable: 56,830 bytes in 246 blocks
==1138495==                       of which reachable via heuristic:
==1138495==                         newarray           : 1,536 bytes in 16 blocks
==1138495==         suppressed: 0 bytes in 0 blocks
==1138495== Rerun with --leak-check=full to see details of leaked memory
==1138495== 
==1138495== For lists of detected and suppressed errors, rerun with: -s
==1138495== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
[schwarzgerat](0) $ 
dankamongmen commented 3 years ago

for all those playing at home, it's likely that whatever's affecting us in ShufflePile is likewise going to trigger on ShufflePileFamilies, which is just not being hit due to the SIGSEGV. so if it's in the test rather than the library core, it'll need to be fixed in both places.

dankamongmen commented 3 years ago
[schwarzgerat](0) $ cat e
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
  *************************   0x5572e3da7630 pile ****************************
0000 off y:   7 x:   7 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dabd40 new3
 bound 0x5572e3dd2710 ← 0x5572e3dd2780 → (nil) binds (nil)
0001 off y:   6 x:   6 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd25b0 new2
 bound 0x5572e3dd25b0 ← 0x5572e3da7640 → (nil) binds (nil)
0002 off y:   3 x:   3 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd2710 new1
 bound 0x5572e3dd2710 ← 0x5572e3dabda0 → (nil) binds 0x5572e3dabd40
 WARNING: expected *->bprev 0x5572e3dd2710, got (nil)
 ______________________________________________________________________________
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
  *************************   0x5572e3da7630 pile ****************************
0000 off y:   6 x:   6 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd25b0 new2
 bound 0x5572e3dd25b0 ← 0x5572e3da7640 → (nil) binds (nil)
0001 off y:   3 x:   3 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd2710 new1
 bound 0x5572e3dd2710 ← 0x5572e3dabda0 → (nil) binds (nil)
 WARNING: expected *->bprev 0x5572e3dd2710, got (nil)
 ______________________________________________________________________________
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
  *************************   0x5572e3da7630 pile ****************************
0000 off y:   3 x:   3 geom y:  68 x:  78 curs y:   0 x:   0 0x5572e3dd2710 new1
 bound 0x5572e3dd2710 ← 0x5572e3dabda0 → (nil) binds (nil)
 WARNING: expected *->bprev 0x5572e3dd2710, got (nil)
 ______________________________________________________________________________
 -------------------------- notcurses debug state -----------------------------
  *************************   0x5572e3da35f0 pile ****************************
0000 off y:   0 x:   0 geom y:  70 x:  80 curs y:   0 x:   0 0x5572e3d68040 std
 bound 0x5572e3d68040 ← 0x5572e3da3600 → (nil) binds (nil)
 ______________________________________________________________________________
[schwarzgerat](0) $ 
kaniini commented 3 years ago

Ah, linked list corruption. That's what my guess was going to be based on looking at the code.

dankamongmen commented 3 years ago

Ah, linked list corruption. That's what my guess was going to be based on looking at the code.

absolutely, i am the suck. i'll have it fixed in 10min, but you're welcome to race if you'd like =]. SHOULDA USED RUST.

dankamongmen commented 3 years ago

and by the way i can't thank you and @Ikke enough, nor the rest of Alpine. i've no idea why yours is the only config that caught this, but you've done me a tremendous service.

dankamongmen commented 3 years ago

ok yeah, in ncplane_reparent() we're improperly splicing the reparented plane's children into the pile's root list. stupid me!

kaniini commented 3 years ago

musl's malloc-ng malloc implementation catches a lot of bugs like these. if you're interested, i could set up alpine-based CI using github actions.

dankamongmen commented 3 years ago

musl's malloc-ng malloc implementation catches a lot of bugs like these. if you're interested, i could set up alpine-based CI using github actions.

i've got a ci server at https://drone.dsscaw.com:4443/ (or i did, anyway; it apparently has stopped), that i've been meaning to throw alpine onto. the musl observation is a compelling one. certainly don't let me stop you from setting up whatever you'd like, of course, but yeah throwing an alpine build into .drone.yml sounds like a great idea.

dankamongmen commented 3 years ago

with that said, if you're offering to do this because you're interested in getting involved with notcurses, i'd be delighted to have you aboard, and am happy to let you take over whatever you'd like.

dankamongmen commented 3 years ago

i've got a fix for this. valgrind now runs clear.

kaniini commented 3 years ago

I do have some interest in notcurses; for example I am interested in using it as a basis for a replacement Alpine installer, modelled after FreeBSD's bsdinstall.

dankamongmen commented 3 years ago

I do have some interest in notcurses; for example I am interested in using it as a basis for a replacement Alpine installer, modelled after FreeBSD's bsdinstall.

well i've no idea as to what kind of time you want to put into it, but so long as you don't go committing into my C core without letting me know =], i'm happy to make you a collaborator. alternatively, you can just send PRs and know you're on the fast track for approval, heh =]. i'm honored to have people of your competence interested in my humble little project, and @joseluis can hopefully vouch for my willingness to explain my mysterious/inscrutable codes and comments via mail.

dankamongmen commented 3 years ago

also, i'm delighted to hear of potential use in an alpine installer. feel free to hit me at nickblack@linux.com with any questions you run into, and be liberal with the feature request button. 2.1.1 adds progress bars (already visible in the allgraph and uniblock demos from head), and if tree-based selectors would be useful, i can move up #1164 .

kaniini commented 3 years ago
>>> notcurses: Build complete at Wed, 16 Dec 2020 07:30:04 +0000 elapsed time 0h 5m 11s

Looks like we've got it this time.

dankamongmen commented 3 years ago
>>> notcurses: Build complete at Wed, 16 Dec 2020 07:30:04 +0000 elapsed time 0h 5m 11s

Looks like we've got it this time.

boom! definitely a team effort. thank you for restoring a bit of my faith in humanity and free software.