SWI-Prolog / swipl-devel

SWI-Prolog Main development repository
http://www.swi-prolog.org
Other
959 stars 175 forks source link

engine_next crashes http server #638

Open esad opened 4 years ago

esad commented 4 years ago

Here's a minimal sample where an engine is created, queried and destroyed within the http handler:

:- use_module(library(http/thread_httpd)).
:- use_module(library(http/http_dispatch)).

:- http_handler('/', test, []).

test(_Request) :-
  engine_create(X, foo(X), E),
  engine_next(E, X),
  format("~d", [X]),
  engine_destroy(E).

foo(1).

:- http_server(http_dispatch, [port(5000)]).

When I request http://localhost:5000 now I get the following stacktrace:

C-stack trace labeled "crash":
  [0] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(save_backtrace+0x119) [0x10daf4309]
  [1] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(sigCrashHandler+0xe0) [0x10daf4910]
  [2] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(dispatch_signal+0x22f) [0x10da7407f]
  [3] /usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1a) [0x7fff7c45af5a]
  [5] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(PL_next_solution+0xe6c3) [0x10d9f64e3]
  [6] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(callCleanupHandler+0x364) [0x10da0e054]
  [7] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(PL_next_solution+0x15e4d) [0x10d9fdc6d]
  [8] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(callProlog+0x152) [0x10da52762]
  [9] /Users/esad/swipl-dev/lib/swipl/lib/x86_64-darwin/libswipl.8.dylib(start_thread+0x206) [0x10da8cf36]
  [10] /usr/lib/system/libsystem_pthread.dylib(_pthread_body+0x154) [0x7fff7c4646c1]
  [11] /usr/lib/system/libsystem_pthread.dylib(_pthread_body+0x0) [0x7fff7c46456d]
  [12] /usr/lib/system/libsystem_pthread.dylib(thread_start+0xd) [0x7fff7c463c5d]
Prolog stack:
  [17] time:remove_alarm_notrace/1 [PC=1 in supervisor]
  [16] system:$c_call_prolog/0 [PC=0 in top query clause]
  [15] system:setup_call_catcher_cleanup/4 <no clause>
  [13] time:call_with_time_limit/2 [PC=40 in clause 1]
  [12] http_dispatch:time_limit_action/3 [PC=53 in clause 1]
  [8] httpd_wrapper:call_handler/3 [PC=35 in clause 2]
  [7] system:catch/3 [PC=2 in clause 1]
  [6] httpd_wrapper:handler_with_output_to/5 [PC=24 in clause 1]
  [5] httpd_wrapper:handler_with_output_to/5 [PC=18 in clause 2]
  [4] httpd_wrapper:http_wrapper/5 [PC=155 in clause 1]
  [3] thread_httpd:http_process/4 [PC=64 in clause 1]
  [2] system:catch/3 [PC=2 in clause 1]
  [1] thread_httpd:http_worker/1 [PC=168 in clause 1]
  [0] system:$c_call_prolog/0 [PC=0 in top query clause]
Running on_halt hooks with status 139
Killing 72229 with default signal handlers
Segmentation fault: 11

LLDB dump:

?- time.so was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 72369 stopped
* thread #6, name = 'httpd@5000_4', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
    frame #0: 0x0000000100612f05 time.so`remove_alarm [inlined] unlinkEvent(ev=0x00000001028f02c0) at time.c:330 [opt]
   327      sched->first = ev->next;
   328  
   329    if ( ev->next )
-> 330      ev->next->previous = ev->previous;
   331  
   332    ev->next = ev->previous = NULL;   /* in case it's reused */
   333  }
Target 0: (swipl) stopped.
(lldb) bt
* thread #6, name = 'httpd@5000_4', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
  * frame #0: 0x0000000100612f05 time.so`remove_alarm [inlined] unlinkEvent(ev=0x00000001028f02c0) at time.c:330 [opt]
    frame #1: 0x0000000100612ecc time.so`remove_alarm [inlined] freeEvent(ev=0x00000001028f02c0) at time.c:338 [opt]
    frame #2: 0x0000000100612ecc time.so`remove_alarm [inlined] removeEvent at time.c:701 [opt]
    frame #3: 0x0000000100612eb3 time.so`remove_alarm(alarm=<unavailable>) at time.c:927 [opt]
    frame #4: 0x00000001000bc4e3 libswipl.8.dylib`PL_next_solution(qid=<unavailable>) at pl-vmi.c:3834 [opt]
    frame #5: 0x00000001000d4054 libswipl.8.dylib`callCleanupHandler(fr=<unavailable>, reason=<unavailable>, __PL_ld=0x0000000101835c00) at pl-wam.c:0 [opt]
    frame #6: 0x00000001000b8427 libswipl.8.dylib`PL_next_solution [inlined] frameFinished(fr=<unavailable>) at pl-wam.c:751 [opt]
    frame #7: 0x00000001000b8408 libswipl.8.dylib`PL_next_solution(qid=<unavailable>) at pl-vmi.c:4256 [opt]
    frame #8: 0x0000000100118762 libswipl.8.dylib`callProlog(module=<unavailable>, goal=<unavailable>, flags=8, ex=0x000070000f107eb0) at pl-pro.c:384 [opt]
    frame #9: 0x0000000100152f36 libswipl.8.dylib`start_thread(closure=0x000000010055d120) at pl-thread.c:1804 [opt]
    frame #10: 0x00007fff7c4646c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #11: 0x00007fff7c46456d libsystem_pthread.dylib`_pthread_start + 377
    frame #12: 0x00007fff7c463c5d libsystem_pthread.dylib`thread_start + 13
esad commented 4 years ago

(I encountered this on 8.1.x, but now I upgraded to version 8.3.3-59-g5d6358244 and this still happens)

JanWielemaker commented 4 years ago

Pushed a fix for the clib submodule. The engine destruction destroyed the timers that belong to the original OS thread rather than the engine. That is fixed. An open issue is what should happen with the timeout? Now a possible timeout does not apply to the engine and gets effectuated as the engine_next/2 return control to the original thread. That is sometimes good, notably if you use engines to communicate between threads, and sometimes bad (when dealing with an engine only supporting the current thread and optionally doing slow computations).

esad commented 4 years ago

(Note that when engine_destroy/1 is removed, the process still crashes, I guess it's getting garbage collected?)

I'm not familiar with engines or http code enough (yet :-) so you may need to explain the question. Which timeouts are you referring to? The way I understand it is that engines have no timeouts (coroutines) but http handlers do. So when an engine is taking time and triggering a timeout, shouldn't it be destroyed from the outside, by some kind of http server watchdog thread?

Btw. I think this might be the same bug as reported in https://swi-prolog.discourse.group/t/http-engines-and-timeouts/1631

JanWielemaker commented 4 years ago

(Note that when engine_destroy/1 is removed, the process still crashes, I guess it's getting garbage collected?)

Doesn't reproduce.

shouldn't it be destroyed from the outside, by some kind of http server watchdog thread?

That is not entirely clear. If the engine solely works for the involved HTTP handler, probably that would be best. Engines can also be used to share information between threads though and in that case you probably do not want it to be killed.

JanWielemaker commented 4 years ago

This issue has been mentioned on SWI-Prolog. There might be relevant details there:

https://swi-prolog.discourse.group/t/http-engines-and-timeouts/1631/4