fontforge / fontforge

Free (libre) font editor for Windows, Mac OS X and GNU+Linux
http://fontforge.github.io/
Other
6.52k stars 704 forks source link

Crash when reloading quickly #4568

Open dscorbett opened 3 years ago

dscorbett commented 3 years ago

Reloading a font can crash FontForge. I am using FontForge 20201107 on macOS 10.14.6. To reproduce, open a font, then hold down the key combination for reloading (⌘⇧R on macOS) for a few seconds. Here is a crash report.

ctrlcctrlv commented 3 years ago

Any font?

dscorbett commented 3 years ago

Yes, this seems to happen with any font.

skef commented 3 years ago

I put a static boolean around the guts of the function to see if overlapping calls were possible. They're possible!

#0  0x00007ffff4ff0615 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff4fd9862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff4fd9747 in _nl_load_domain.cold () at /usr/lib/libc.so.6
#3  0x00007ffff4fe8bf6 in  () at /usr/lib/libc.so.6
#4  0x00007ffff66a60f3 in _FVRevert (fv=0x6130002cb8c0, tobackup=0) at ../fontforge/fontviewbase.c:1685
#5  0x00007ffff66a73ed in FVRevert (fv=0x6130002cb8c0) at ../fontforge/fontviewbase.c:1788
#6  0x0000555555a897e1 in FVTimer (fv=0x6130002cb8c0, event=0x7fffffffb3b0)
    at ../fontforgeexe/fontview.c:6675
#7  0x0000555555a8a159 in v_e_h (gw=0x610000010640, event=0x7fffffffb3b0)
    at ../fontforgeexe/fontview.c:6752
#8  0x0000555555df2c0f in _GWidget_Container_eh (gw=0x610000010640, event=0x7fffffffb3b0)
    at ../gdraw/gcontainer.c:405
#9  0x0000555555e29bb5 in _GGDKDraw_CallEHChecked (gw=0x610000010640, event=0x7fffffffb3b0, eh=
    0x555555df09ac <_GWidget_Container_eh>) at ../gdraw/ggdkdraw.c:346
#10 0x0000555555e2e548 in _GGDKDraw_ProcessTimerEvent (user_data=0x606000233840)
    at ../gdraw/ggdkdraw.c:862
#11 0x00007ffff51cf2c4 in  () at /usr/lib/libglib-2.0.so.0
#12 0x00007ffff51cea84 in g_main_context_dispatch () at /usr/lib/libglib-2.0.so.0
#13 0x00007ffff52229b1 in  () at /usr/lib/libglib-2.0.so.0
#14 0x00007ffff51cd2b1 in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#15 0x0000555555e39d21 in GGDKDrawProcessPendingEvents (gdisp=0x615000007d80) at ../gdraw/ggdkdraw.c:2177
#16 0x0000555555e02dc2 in GDrawProcessPendingEvents (gdisp=0x615000007d80) at ../gdraw/gdraw.c:721
#17 0x0000555555e93cb8 in GProgressProcess (p=0x60b00009a930) at ../gdraw/gprogress.c:154
#18 0x0000555555e960ce in GProgressNext () at ../gdraw/gprogress.c:434
#19 0x00007ffff6b1b792 in SFD_GetFont
    (sfd=0x6150000cf600, cidmaster=0x0, tok=0x7fffffffbc00 "BeginChars:", fromdir=0, dirname=0x602000208790 "/tmp/l.sfd", sfdversion=3) at ../fontforge/sfd.c:8998
#20 0x00007ffff6b1c7aa in SFD_Read (filename=0x602000208790 "/tmp/l.sfd", sfd=0x6150000cf600, fromdir=0)
    at ../fontforge/sfd.c:9092
#21 0x00007ffff6b1ccf1 in _SFDRead (filename=0x602000208790 "/tmp/l.sfd", sfd=0x6150000cf600)
    at ../fontforge/sfd.c:9125
#22 0x00007ffff6b8e409 in _ReadSplineFont
    (file=0x6150000cf600, filename=0x6020001f5fb0 "/tmp/l.sfd", openflags=0)
    at ../fontforge/splinefont.c:1179
#23 0x00007ffff6b8fd15 in ReadSplineFont (filename=0x6020001f5fb0 "/tmp/l.sfd", openflags=0)
    at ../fontforge/splinefont.c:1322
#24 0x00007ffff66a65bd in _FVRevert (fv=0x6130002cb8c0, tobackup=0) at ../fontforge/fontviewbase.c:1734
#25 0x00007ffff66a73ed in FVRevert (fv=0x6130002cb8c0) at ../fontforge/fontviewbase.c:1788
--Type <RET> for more, q to quit, c to continue without paging--
#26 0x0000555555a897e1 in FVTimer (fv=0x6130002cb8c0, event=0x7fffffffcbc0)
    at ../fontforgeexe/fontview.c:6675
#27 0x0000555555a8a159 in v_e_h (gw=0x610000010640, event=0x7fffffffcbc0)
    at ../fontforgeexe/fontview.c:6752
#28 0x0000555555df2c0f in _GWidget_Container_eh (gw=0x610000010640, event=0x7fffffffcbc0)
    at ../gdraw/gcontainer.c:405
#29 0x0000555555e29bb5 in _GGDKDraw_CallEHChecked (gw=0x610000010640, event=0x7fffffffcbc0, eh=
    0x555555df09ac <_GWidget_Container_eh>) at ../gdraw/ggdkdraw.c:346
#30 0x0000555555e2e548 in _GGDKDraw_ProcessTimerEvent (user_data=0x6060002336c0)
    at ../gdraw/ggdkdraw.c:862
#31 0x00007ffff51cf2c4 in  () at /usr/lib/libglib-2.0.so.0
#32 0x00007ffff51cea84 in g_main_context_dispatch () at /usr/lib/libglib-2.0.so.0
#33 0x00007ffff52229b1 in  () at /usr/lib/libglib-2.0.so.0
#34 0x00007ffff51cd2b1 in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#35 0x0000555555e39eb4 in GGDKDrawEventLoop (gdisp=0x615000007d80) at ../gdraw/ggdkdraw.c:2205
#36 0x0000555555e02ed9 in GDrawEventLoop (gdisp=0x615000007d80) at ../gdraw/gdraw.c:731
#37 0x0000555555d60d9d in fontforge_main (argc=2, argv=0x7fffffffe6a8) at ../fontforgeexe/startui.c:1431
#38 0x00005555556b5d99 in main (argc=2, argv=0x7fffffffe6a8) at ../fontforgeexe/main.c:33

This smells like Bullshit Multithreading to me -- a high cost just for a progress bar.

@jtanx Seems like at a minimum we'd either need to be able to disable/override the progress callouts or split the events into flavors somehow to solve this.

jtanx commented 3 years ago

It's not a multithreading problem but a reentrancy one. It's more of a fundamental problem of how gdraw does dialogs/modals, where you pump the event loop from within blocking functions, so it's possible to end up with nested events.

skef commented 3 years ago

The explanation is a reentrantcy problem, but the cause is a bullshit model where someone said "since we're just on one thread it will be fine to go do some other stuff for a while!" without thinking through the premises of the programming model.

It's fine to do graphics stuff this way -- the graphics stuff is presumably written not to give up a part of its world with unfinished work. But one can't just let execution wander back into the font-substantive code or even the UI stuff in fontforgeexe. The progress bar stuff needs to record that its running and the event processing needs to defer events that call out while that record applies. Anything not equivalent to or more conservative than that is insane.

Or I guess if no one is willing to work on this we can add a preference to enable progress dialogs with a default of false.

jtanx commented 3 years ago

I'd say there's zero difference between the progress bar and the other dialogs and 'graphics stuff' that you claim is fine. Without looking into this any further than the backtrace you've posted, either the progress bar should be modal and not be forwarding anything to the parent, and/or the fontview timer (whatever that's used for) should be cancelled before doing said action.

skef commented 3 years ago

I've now had a number of hours of shaking my fists at the sky and obsessing about first principles and I think I have a bit of a better handle on this problem.

Sometimes, in event-loop driven architectures, you process some events above the main loop. One time is when displaying a modal dialog, maybe like this:

    GDrawSetVisible(sd.gw,true);
    while ( !sd.done )
        GDrawProcessOneEvent(NULL);

So is this "safe"? And if so, why specifically? If this is a dialog concerning some font metadata, can one be sure that the font will still be there during and immediately after the dialog is displayed? What makes a modal dialog modal?

I think the answer to the first question is "yes" and the answer to the last question is, broadly speaking, "the window's event handler". That is, the window itself has its own event-processing context and it is the scope of that context that provides the modality. Pressing a key while the window is raised (for example) won't call back into "substantive" code unless you write it that way.

This preserves the basic premise of single-threaded safety: When setting a local pointer variable (for example) and going on to call something else, the worry is whether that call and the calls under it might invalidate the pointer, and only those calls. You don't have to worry that any given function might execute.

So what about a progress bar, at least in the abstract? Basically, same answer; it's just a variety of modal dialog that happens to not have any input interfaces and where there's more "behind the scenes" code execution than normal. Type what you think of as a hotkey and it should be eaten by the dialog window's event handler, to no effect. Only graphics events are processed because that's the scope of the event handler.

Is that true of FontForge's progress bar? I haven't conclusively verified this but it looks that way. The event handler is progess_eh() which seems to have the expected limited scope.

Therefore having worked back through the fundamentals one can see the diagnosis: There is a subset of FontForge code with calls like:

    ff_progress_start_indicator(10,_("Saving..."),_("Saving Spline Font Database"),_("Saving Outlines"), realcnt,i+1);
    ff_progress_enable_stop(false);    if ( sf->mm != NULL )
    err = SFD_Dump(sfd,sf,map,normal,todir,dirname);
    ff_progress_end_indicator();

Then there's code in and/or under the equivalent of SFD_Dump() instrumented with calls like:

    ff_progress_next_stage();
    ff_progress_next();
    ff_progress_change_line2(_("Saving Bitmaps"));

And some of these execute the event loop.

Therefore the fundamental problem is, at present, calling any of that code is only safe when a progress dialog is raised.

The likely fix, then, is:

  1. Add a "progress instance count" static variable to gprogress.c which is incremented in ff_progress_start_indicator() and decremented in ff_progress_end_indicator();.
  2. Only do work in the "intermediate" calls when the instance count is greater than zero

@jtanx I was writing this up as you posted.

skef commented 3 years ago

OK, so gprogress.c is already attempting to do what I describe with the current/prev list, so the narrower problem is that there's some hole in that logic.

skef commented 3 years ago

OK -- the progress bar seems fine after all, so back to first principles, I guess.

You said:

and/or the fontview timer (whatever that's used for) should be cancelled before doing said action.

This is presumably a reference to:

static gboolean _GGDKDraw_ProcessTimerEvent(gpointer user_data) {
    GGDKTimer *timer = (GGDKTimer *)user_data;
    GEvent e = {0};
    bool ret = true;

    if (!timer->active || _GGDKDraw_WindowOrParentsDying((GGDKWindow)timer->owner)) {
        timer->active = false;
        timer->stopped = true;
        return false;
    }

    e.type = et_timer;
    e.w = timer->owner;
    e.native_window = timer->owner->native_window;
    e.u.timer.timer = (GTimer *)timer;
    e.u.timer.userdata = timer->userdata;

    GGDKDRAW_ADDREF(timer);
    _GGDKDraw_CallEHChecked((GGDKWindow)timer->owner, &e, timer->owner->eh);
    if (timer->active) {
        if (timer->repeat_time == 0) {
            timer->stopped = true; // Since we return false, this timer is no longer valid.
            GGDKDrawCancelTimer((GTimer *)timer);
            ret = false;
        } else if (timer->has_differing_repeat_time) {
            timer->has_differing_repeat_time = false;
            timer->glib_timeout_id = g_timeout_add(timer->repeat_time, _GGDKDraw_ProcessTimerEvent, timer);
            ret = false;
        }
    }

    GGDKDRAW_DECREF(timer, _GGDKDraw_OnTimerDestroyed);
    return ret;
}

and its gxdraw equivalent (I've confirmed the problem occurs with both back-ends.)

I look at this function and see a modality violation: the code digs out a window and calls the event handler regardless of whether the event focus is grabbed. If I understand you right, you see a responsibility to manually cancel the timer before going modal (which none of our dialogs seem to do, as far as I know). Can you say a bit more about that responsibility?

skef commented 3 years ago

(Clarifying side note: on my system a progress bar isn't typically displayed unless an operation takes a while, and the reload behind this bug doesn't take long enough. Hence my misdiagnosis of the problem lying in bed a few hours ago. It does seem that the state is already tracked correctly and in this case there is a dialog even if it doesn't get painted on the screen.)

jtanx commented 3 years ago

Yes, I mean you should cancel the timer yourself, because that's how gdraw works. It basically comes back to this point:

So is this "safe"?

And the answer is clearly no, you can contrive situations where it's not.

You can make the argument that timers to other windows shouldn't be allowed when a modal is active, but then what about all the other events that are allowed though (e.g. mouse move events; the tooltips in the fontview are a good example)? There's nothing that prescribes what you can or can't do in a timer, nor on any of the other events for that matter.

skef commented 3 years ago

Yes, I mean you should cancel the timer yourself, because that's how gdraw works.

And the answer is clearly no, you can contrive situations where it's not.

I'm open to the idea that this problem might need a fix above the gdraw level, so that gdraw isn't ultimately modified. However, I don't see a non-systemic solution to the problem, and gdraw isn't a bad place for systemic code.

For example: Suppose I'm writing code that raises a CharView dialog, like the Expand Stroke dialog. I have some pointers to a contour. Then this timer Char event comes into the FontView and _FVRevert() is called, invalidating those pointers. Things aren't very modal at that point. So is the CharView dialog code supposed to cancel the FontView timer? Wouldn't just about every dialog need to cancel it?

So maybe you build some state outside of GDraw where dialog with font-state calls a function at the start that represents "font-modal execution" and another at the end to clear the state. Then whatever executes the timed events can look at that state and not run when its set. That's a way to solve the problem. However:

  1. Is there really that much benefit to being able to execute these events in the few dialogs that are compatible with it?
  2. If not, why not just have a "modal-compatible" timer event flag, and either throw away or defer the event if there is a modal dialog up?
  3. And if you're doing that, why not build it into the gdraw level, given that gdraw is what processes timer events, the struct is called GTimer and defined in gdraw.h, etc. etc.