dk / Prima

prima.eu.org
Other
106 stars 27 forks source link

SEGV from PDL::Graphics::Simple Prima tests with backtrace #75

Closed mohawk2 closed 1 year ago

mohawk2 commented 1 year ago

I installed CPAN-latest PDL::Drawing::Prima and PDL::Graphics::Prima. I built and installed git-latest Prima with debugging symbols (by adding -g to the OPTIMIZE = line in the generated Makefile), then ran this in a checkout of PDL::Graphics::Simple:

$ perl Makefile.PL; make
$ AUTOMATED_TESTING=1 gdb perl -ex 'run -Mblib t/simple.t'
[snip]
ok 62 - PDL::Graphics::Simple::Prima::check() ran OK

Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
___pthread_mutex_lock (mutex=0x0) at ./nptl/pthread_mutex_lock.c:80
80  ./nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0  ___pthread_mutex_lock (mutex=0x0) at ./nptl/pthread_mutex_lock.c:80
#1  0x00007ffff6afc50f in XrmQGetResource ()
    at /lib/x86_64-linux-gnu/libX11.so.6
#2  0x00007ffff62d570e in apc_fetch_resource
    (className=<optimised out>, name=name@entry=0x5555598cbc00 "window1", resClass=resClass@entry=0x5555598796f0 "Foreground", res=res@entry=0x555559911db0 "color", owner=owner@entry=93825028420368, resType=resType@entry=1, result=0x7fffffffcf94) at unix/misc.c:165
#3  0x00007ffff622ce54 in Widget_fetch_resource
    (className=<optimised out>, name=<optimised out>, classRes=<optimised out>, res=<optimised out>, owner=owner@entry=93825028420368, resType=resType@entry=1)
    at class/Widget.c:476
#4  0x00007ffff622d0c0 in Widget_fetch_resource_FROMPERL (cv=<optimised out>)
    at include/generic/Widget.inc:397
#5  0x000055555565bfca in Perl_pp_entersub ()
#6  0x0000555555651df3 in Perl_runops_standard ()
#7  0x00005555555c1c5f in Perl_call_sv ()
#8  0x00007ffff61fba1c in clean_perl_call_pv
    (subname=subname@entry=0x7ffff62fa5de "Prima::Object::profile_add", flags=flags@entry=4) at api/perl.c:39
#9  0x00007ffff61e1ccd in template_imp_void_Handle_SVPtr
    (subName=0x7ffff62fa5de "Prima::Object::profile_add", self=93825060130016, profile=0x55555985b680) at include/generic/thunks.tinc:150
#10 0x00007ffff622042b in Object_create
    (className=<optimised out>, profile=profile@entry=0x555559866490)
    at class/Object.c:42
#11 0x00007ffff61d511b in create_from_Perl (cv=<optimised out>) at api/api.c:78
#12 0x000055555565bfca in Perl_pp_entersub ()
#13 0x0000555555651df3 in Perl_runops_standard ()
#14 0x00005555555c9c05 in perl_run ()
#15 0x00005555555a11ea in main ()
(gdb) frame 2
#2  0x00007ffff62d570e in apc_fetch_resource (className=<optimised out>, 
    name=name@entry=0x5555598cbc00 "window1", 
    resClass=resClass@entry=0x5555598796f0 "Foreground", 
    res=res@entry=0x555559911db0 "color", owner=owner@entry=93825028420368, 
    resType=resType@entry=1, result=0x7fffffffcf94) at unix/misc.c:165
165     if ( XrmQGetResource( guts.db,
(gdb) print guts.db
$8 = (XrmDatabase) 0x5555573f6730
(gdb) list
160             _debug( "%s ", XrmQuarkToString( classes[i]));
161         }
162         _debug( "\n");
163     }
164 
165     if ( XrmQGetResource( guts.db,
166                 instances,
167                 classes,
168                 &type, &value)) {
169         if ( type == guts.qString) {

The actual SEGV almost certainly comes from this call: https://github.com/freedesktop/xorg-lib-libX11/blob/master/src/Xrm.c#L2549 but it's not clear to me at all why &db->linfo would be 0x0 as gdb shows it. Perhaps some cleanup is being done assuming some initialisation was done, but the initialisation wasn't in fact done.

mohawk2 commented 1 year ago

Having recompiled and reinstalled Prima with the full debugging capability (by setting PRIMA_DEBUG env var to 1), then rerunning the above with --debug=x appended to the t/simple.t, I get this additional info which doesn't seem surprising or helpful, but given my lack of knowledge I might be wrong:

[snip]
ok 62 - PDL::Graphics::Simple::Prima::check() ran OK
misc: inst: prima window1 color 
misc: class: Prima Window Foreground 

An alternative, explored recently with @vikasnkumar, is to install Devel::TraceRun, then run this:

$ AUTOMATED_TESTING=1 perl -d -d:TraceRun -Mblib t/simple.t
[snip]
    PDL::Graphics::Simple::Prima::check()
    return(1)
    PDL::Graphics::Simple::_regularize_size(ARRAY,px)
    return(ARRAY)
    Prima::Object::create(Prima::Win,text,PDL/Prima ,size,ARRAY,onCreate,CODE,onDestroy,CODE)
      Prima::Object::CREATE(Prima::Win)
      return(Prima::Window)
      Prima::Object::profile_add(Prima::Window,HASH)
        Prima::Window::profile_default(Prima::Window)
          Prima::Widget::profile_default(Prima::Window)
            Prima::Drawable::profile_default(Prima::Window)
              Prima::Component::profile_default(Prima::Window)
                Prima::Object::profile_default(Prima::Window)
                return(HASH)
              return(HASH)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(0)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(16777215)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(2)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(0)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(1)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(0)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(0)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(2)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(2)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(0)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return()
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(0)
              Prima::Widget::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1202)()
              return(15)
            return(HASH)
            Prima::Widget::get_default_font(Prima::Window)
            return(HASH)
            Prima::Widget::get_default_popup_font(Prima::Window)
            return(HASH)
          return(HASH)
          Prima::Const::AUTOLOAD()
            bi::constant(All)
            return(15)
          return(15)
          Prima::Const::AUTOLOAD()
            bs::constant(Sizeable)
            return(1)
          return(1)
          Prima::Const::AUTOLOAD()
            gm::constant(DontCare)
            return(64)
          return(64)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435457)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435458)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435459)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435460)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435461)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435462)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435463)
          Prima::Object::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:291)()
          return(268435464)
          Prima::Window::get_default_menu_font(Prima::Window)
          return(HASH)
          Prima::Const::AUTOLOAD()
            mb::constant(Cancel)
            return(4)
          return(4)
          Prima::Const::AUTOLOAD()
            wc::constant(Window)
            return(983040)
          return(983040)
          Prima::Const::AUTOLOAD()
            ws::constant(Normal)
            return(0)
          return(0)
        return(HASH)
        Prima::Window::profile_check_in(Prima::Window,HASH,HASH)
          Prima::Widget::profile_check_in(Prima::Window,HASH,HASH)
            Prima::Drawable::profile_check_in(Prima::Window,HASH,HASH)
              Prima::Component::profile_check_in(Prima::Window,HASH,HASH)
                Prima::Object::profile_check_in(Prima::Window,HASH,HASH)
                return()
                Prima::Component::get_components(Prima::Application)
                return(Prima::Timer,Prima::HintWidget,Prima::Clipboard,Prima::Clipboard,Prima::Clipboard,Prima::Clipboard)
              return()
              Prima::Drawable::font_match(Prima::Dra,HASH,HASH)
              return(HASH)
            return()
            Prima::Widget::autoEnableChildren(Prima::Application)
            return(0)
            Prima::Window::([snip]/lib/perl5/x86_64-linux/Prima/Classes.pm:1895)()
            return(1)
            Prima::Widget::fetch_resource(Window,Window1,Foreground,color,Prima::Application,1)
Segmentation fault (core dumped)

It's quite possible that PDL::Graphics::Simple::Prima is doing something incorrect (if so, please say what and I can fix it), but it seems to me Prima should not allow itself to segfault under any circumstances.

dk commented 1 year ago

I can't get the test to run at all: First, it wanted PGPLOT:

dk@kraken src/PDL-Graphics-Simple> perl t/simple.t
Can't locate PGPLOT.pm in @INC (you may need to install the PGPLOT module) (@INC contains: /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1 /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1/x86_64-linux /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/5.26.1) at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/PDL/Graphics/PGPLOT/Window.pm line 2261.
BEGIN failed--compilation aborted at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/PDL/Graphics/PGPLOT/Window.pm line 2261.
Compilation failed in require at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/PDL/Graphics/Simple/PGPLOT.pm line 28.

1..0 # SKIP No plotting engines installed

and then after installing it it dies with some unintelligible errors:

dk@kraken src/PDL-Graphics-Simple> perl t/simple.t
String found where operator expected at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/PDL/Graphics/PGPLOT/Window.pm line 6457, near "PDL::thread_define '_tcircle(a();b();c();ind()), NOtherPars => 2'"
        (Do you need to predeclare PDL::thread_define?)
syntax error at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/PDL/Graphics/PGPLOT/Window.pm line 6457, near "PDL::thread_define '_tcircle(a();b();c();ind()), NOtherPars => 2'"
Can't use global @_ in "my" at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/PDL/Graphics/PGPLOT/Window.pm line 6459, near "=@_"
syntax error at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/x86_64-linux/PDL/Graphics/PGPLOT/Window.pm line 6461, near "}"
Compilation failed in require at /home/dk/perl5/perlbrew/perls/perl-5.26.1/lib/site_perl/5.26.1/PDL/Graphics/Simple/PGPLOT.pm line 28.
1..0 # SKIP No plotting engines installed
dk commented 1 year ago

All right, I had a clean reinstall and can run t/simple.t - but without a single hiccup. I also ran under valgrind, and all is just fine. Also both with and without OPTIMIZE=-g.

I'm getting PDL::Graphics::Simple::register: PDL::Graphics::Simple::Prima is out of date - winging it at lib/PDL/Graphics/Simple.pm line 1397. but that's probably not relevant...

I'm a bit stuck here because I don't know how to reproduce this - the code shouldn't even be reachable if either initialization wasn't done or deinitialization has finished (which I also doubt because guts, including guts.db, gets properly zeroed before and after).

So all in all it looks rather mysterious. At the very worst case, I'd suggest to run it with -d:Trace which would create huge output, but then if it would be possibly to reduce t/simple.t so a minimal piece of code that would help...

dk commented 1 year ago

I just wonder if you would see the error if you could possibly take one of f ex https://hub.docker.com/_/perl docker images, with stock PDL and latest github snapshots of Prima and PDL:: submodules?

vikasnkumar commented 1 year ago

Hi Dmitry This segfault was happening on Debian Bullseye with system Perl 5.32.1. I use cpanm and local::lib to install Prima, PDL, PDL::Graphics::Simple. I installed PDL::Graphics::Prima from Github.

Thanks.

On 12/31/22 10:27, Dmitry Karasik wrote:

I just wonder if you would see the error if you could possibly take one of f ex https://hub.docker.com/_/perl docker images, with stock PDL and latest github snapshots of Prima and PDL:: submodules?

— Reply to this email directly, view it on GitHub https://github.com/dk/Prima/issues/75#issuecomment-1368241673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAC6WDRMTTLVBWWYZI7Q2TWQBGGTANCNFSM6AAAAAATNI7VMM. You are receiving this because you were mentioned.Message ID: @.***>

dk commented 1 year ago

Hi Vikas,

Thanks for clarification! However I still cannot reproduce it. But here's the thing: this is the debian bullseye virtualbox setup I used where I could not reproduce the SEGV:

http://karasik.eu.org/misc/debian/

the unpacked vdi and vbox files were originally in ~/virtualbox/debian . I wonder if SEGV could be reproduced on this machine? If yes, you won't need to reupload it, just answer the commands used so I could replay them locally..

Also: Login: dk Password: dk

cd src/PDL-Graphics-Simple
perl t/simple.t
vikasnkumar commented 1 year ago

@mohawk2 @dk

I am able to reproduce only by doing the following:

I tested this with all combinations of all the engines since I also have PDL::Graphics::Gnuplot and PDL::Graphics::PGPLOT installed.

@mohawk2 I think the issue is that when one has PDL::Graphics::PLplot installed, there is a SEGV in the test. If I do not have the plplot engine be in the list of engines to test, I get no SEGV just like @dk says.

I think the issue may be with PLplot instead of Prima.

vikasnkumar commented 1 year ago

@dk i confirmed it on your VirtualBox VM too.

It looks like there is a bad interaction with PLplot, and @mohawk2, since P:G:S loads PLplot anyway if it is installed, this is causing the issue I think.

I am able to reproduce on @dk's VM with my @engines = qw/plplot/; now with the above steps.

I do not think the issue is Prima, it may just be PLplot. or some load/unload routines of the library.

dk commented 1 year ago

closing this as both the problem and the fix seems straightforward - thank you everyone! if you though still experience a SEGV kindly reopen

mohawk2 commented 1 year ago

Please could you release the fixed code?

dk commented 1 year ago

I'd rather wait until the next release date, in the end of february, but if you need it yesterday I've just released a dev 1.67_1 version (also for pre-release testing purposes)

mohawk2 commented 1 year ago

I didn't know you had a release schedule, which is fair enough. Sitting on bug-fixes is a surprising approach. They do say if making releases is difficult or painful, do it more often so that you'll automate it. I'd say for my distros, releasing takes around 1 minute and is painless (even though the only automated step is using cpan-upload).

vikasnkumar commented 1 year ago

I am in favor of doing releases in the format of major.minor.patch where the bug fixes can be released as the patch to the existing release so a 1.67.1 can be done for more pressing fixes, and the 1.68 can stay on the quarterly/existing schedule. This will also allow tracking of API changes and keeping the API changes in the major/minor releases.

dk commented 1 year ago

Ah okay got it. 1.67_1 format is a special CPAN format that lets all know this is a development release, which is not shown as latest on the cpan search page. I'm a bit afraid releasing it as 1.67.1 as cpan might think that I changed version numeration and wouldn't accept 1.68 thereafter

dk commented 1 year ago

@mohawk2 I might make an emergency release fixing that one bug but I might as well introduce others -- and that been happening more than a couple of times, so I'd rather be on the cautious side.

mohawk2 commented 1 year ago

A scheme some use for version-numbering is (here) 1.067000, then 1.067001 for a point-release, 1.068000 for the next minor, etc. But all clients should understand 1.68 is later than 1.67.1, since that's the whole point of the multiple-dots scheme. 1.67_1 is very different, as it is only a developer release and requires special effort to even be installed (such as a --dev flag to opt in).

I think a more serious issue is of routinely introducing bugs; that implies there exist ways to improve the testing strategy. If you rely on finding them only at rare intervals on releasing, that will make them harder to fix as one will forget how they happened. Still, I know that GUI code is hard to automatically test.

dk commented 1 year ago

Yes, I agree, the testing could've been better, and I wouldn't say no to anyone who would volunteer to help me with it. The project would benefit from a proper code coverage test, at the very least perl code, and ideally C/XS too. To get the idea how much testing is needed, consider the fresh https://github.com/dk/Prima/blob/master/t/Image/Tile.t that tests only newly implemented tiling fills, which I could swear covers all of it -- and yet there were more bugs found just after 1.67.

So yes I get it that dev release is not a solution, but I'm going to vacation on 4th feb for 2 weeks, and I don't particularly want to rush a potentially buggy release before that

dk commented 1 year ago

PS: Do you guys have an example of a module with minor.major.patch scheme? I'm okay with releasing 1.67.1 but I don't want to run in a situation where CPAN would treat it as a higher number than 1.68 and I wouldn't be able to release it?

vikasnkumar commented 1 year ago

There are some on metacpan, but I cannot find any popular ones that use the major.minor.patch versioning anymore. a lot of them have switched to the %d.%06d format as @mohawk2 has suggested where you will be doing 1.670001 and 1.670002 for minor bug fixes to the existing 1.67 and then do a 1.68000 for a bigger release, with API changes.

Some of INGY's older packages have the 3-dot and even 4-dot versioning.

RJBS on the other hand varies it by decimal place and the number of significant digits increases if there are going to be a high number of bugfixes on the module, such as on the Email::* modules vs some other ones where the bug fixes are going to be fewer over time.

Some even do a YYYYMMDD based versioning since that is always ascending.

I understand that making bigger releases needs to be slow, especially for breaking API changes. But, for bug fixes that cause applications to break, I prefer to have the latest and the greatest bugfix sent out so that users benefit early.

I think it is bad practice if developers have to use github to get your bugfix, as it makes the downstream applications that are dependent on Prima become difficult to install pre-requisites with cpanminus. This is why we have unit tests, and so releases should be easy to make.

dk commented 1 year ago

I think I shall make an upload of a test module (later) to see how cpan treats three and more digits. If all goes good sure 1.67001 or whatever it accepts i'll just use that

mohawk2 commented 1 year ago

I think I shall make an upload of a test module (later) to see how cpan treats three and more digits. If all goes good sure 1.67001 or whatever it accepts i'll just use that

As I mentioned on IRC and have now emailed you, making a global CPAN module called "Test::cpan::versioning" is a bad way to try this. A better thing might have been "Acme::cpan::versiontest", or even asking on #toolchain since this whole topic is very well understood.

dk commented 1 year ago

fixed in 1.67001