gap-system / gap

Main development repository for GAP - Groups, Algorithms, Programming, a System for Computational Discrete Algebra
https://www.gap-system.org
GNU General Public License v2.0
810 stars 160 forks source link

CallWithTimeLimit (was: Alarm timers) #315

Closed stevelinton closed 8 years ago

stevelinton commented 9 years ago

Prompted by Heiko Dietrich's recent forum question, I have been exploring (again) the idea of timeouts in GAP. This is not too hard, although as usual one has to decide which C library API to use (there are at least three).

An initial simple implementation is a GAP-callable function

Alarm(<which>, <seconds>, <useconds>)

which is 0, 1 or 2 specifying whether you are interested in real time, user CPU time or system+user CPU time.

Some time after seconds+10^-6*useconds of the appropriate kind of time has elapsed from this call, the system will experience an interrupt as if Ctrl-C had been pressed or ErrorMayQuit() called. This response can be made a little more sophisticated (for instance using JUMP_TO_CATCH to throw an "exception" rather than entering the break loop). Using a more modern API, per-thread alarms are possible in HPCGAP. It's also easy enough to add functions to probe and reset the timers, and so on.

So far so good. The bit I'm not sure about is what happens if, when or before the timer goes off we have entered a break loop, or finished the current command and returned to the outer loop (and possibly started executing a subsequent command).

Naively, one might expect:

gap> Alarm(0,1,0); while true do x := 1; od;

to stop after one second, but

gap> Alarm(0,3600,0); <something that might take more than an hour but doesn't>
gap>  <some other command entirely>

never to be interrupted.

Similarly if you have a long command with an alarm to limit it to an hour, or whatever, and you enter a break loop for a few seconds early on to check something, you would still expect the timeout to work, but if you are in a break loop after an hour you probably don't want to be interrupted. Indeed, ideally you might want to nest alarms -- so that running a command with a timeout inside the break loop doesn't mess up an outer timer.

I guess anything is possible with enough caching of state and so on, but I'm not sure what should happen. Ideas anyone?

--- updated to fix numerous typos and markdown errors -sorry

stevelinton commented 9 years ago

Another question -- if the alarm timer goes off, and crtl-c is pressed more or less simultaneously, what takes precedence? and what happens once the event that takes precedence has been dealt with? Eg suppose the alarm timer is caught and the code starts to try an alternative algorithm or something, the user interrupt should presumably still be honoured.

I'm not sure exactly what the interaction with store overrun is even now, if you press Ctrl-C during a garbage collection that overruns, for instance, or if a garbage collection overruns while the system is setting up the break loop to handle a Ctrl-C.

Does anyone know of a language that handles this kind of thing nicely?

stevelinton commented 9 years ago

Following some discussion with @ChrisJefferson one way forward might be to concentrate on the main use case and have the only documented function be something like

CallWithTimeLimit( <microseconds or options record>, <func>, <arg1>, .....)

That simplifies things a bit. The default behaviour might be something like:

call the function with the arguments, limiting thread cpu time. If the function completes in that time, return a list containing the result if any. If not, return fail. The timer is stopped on entering a break a loop and restarted on return from a break loop. Nesting is therefore possible.

By passing an options record instead of just an integer timeout more sophisticated behaviours can be added later, including entering a break loop on timeout and using alternative timers.

There might be some race conditions between Ctrl-C memory overrun and timeout but I'm not sure how critical that is.

markuspf commented 9 years ago

I am in favour of the CallWithTimeLimit approach.

I suggest to not use "magic numbers" like the suggested which in the original proposal, but the only nicer way I could think of was passing a string, which may be fine too.

olexandr-konovalov commented 9 years ago

@markuspf me too. Looks nice, almost like some option for call-with-catch.

frankluebeck commented 9 years ago

CallWithTimeLimit looks interesting to me, but I don't see how the Alarm function in the first post could be useful.

If the timer for CallWithTimeLimit used a signal different from SIGINT it would be easy to distinguish from Ctrl-C pressed by the user, or from a break loop. I guess that the timer should be deleted when a break loop is quitted while CallWithTimeLimit(...) is running.

stevelinton commented 9 years ago

CallWithTimeLimit is a replacement for the idea of Alarm.

The timer signal is (with the latest POSIX API) configurable, currently at compile time. I'm using SIGVTALRM. Distinguishing which event has happened is easy, but

a) it will take considerable care to make sure that if two or more of memory overrun, Ctrl-C and timeout happen at close to the same time, none of them gets lost

b) it's not entirely obvious what should happen in all the various scenarios, even once you have sorted out which events occurred.

CallWithTimeLimit as I'm currently implementing it cannot be nested. That could be fixed later.

The timer will be abandoned if you quit from a break loop.

stevelinton commented 9 years ago

Annoying. There's a POSIX 2001 timer API (timer_create and friends) that does just what I need for this and would allow nice things like nested timeouts eventually. However OS X (and probably all BSD systems @markuspf could you check?) don't implement it and on some Linux seems it seems to need strange compiler options). The older setitimer interface seems to be universal, but doesn't track CPU time per thread or allow nesting in any nice way. Looks like I'll have to implement two versions.

fingolfin commented 9 years ago

CallWithTimeLimit indeed sounds quite interesting.

stevelinton commented 8 years ago

See pull request at #331 for a first look at a solution.

stevelinton commented 8 years ago

The pull request is now ready for merge, except that it really needs testing on a range of systems. It seems to work on OS X and one linux system I tried which between them exercise the main branches of the code, but I'd welcome reports from ofher UNIX systems and especially cygwin.

olexandr-konovalov commented 8 years ago

@stevelinton thanks - I will try Cygwin on our VMs used by Jenkins.

markuspf commented 8 years ago

This works on my DragonFlyBSD machine, and on a FreeBSD machine that I tested on.

It's a bit icky that the result is returned as a list, but I see the need for the emulated Maybe type there.

stevelinton commented 8 years ago

Thanks @markuspf. Do you know if your BSD machines have timer-create and friends (POSIX per-process timers) or just setitimer?

markuspf commented 8 years ago

FreeBSD has POSIX per process timers, DragonFly does not (yet) have it, but I think it's fairly easy to implement so I could do it this weekend.

Upside is that both things should be tested this way.

markuspf commented 8 years ago

(Probably an irrelevant aside) Implementing the per process timer API in DragonFly is very simple indeed. In FreeBSD timer_create and setitimer internally use the same mechanisms (and I'll probably just import the code into DragonFly to save hassle).

ChrisJefferson commented 8 years ago

It turns out cygwin only supports CLOCK_REALTIME (was surprised, downloaded source just to check, and indeed that's what supported).

That's a rubbish clock (will move around when daylight savings happens), but I imagine for most users it will be 'good enough'.

olexandr-konovalov commented 8 years ago

@stevelinton may we close this issue now as implemented in #331 and #370?

olexandr-konovalov commented 8 years ago

Closed - implemented in #331 and #370.