SWI-Prolog / packages-http

The SWI-Prolog HTTP server and client libraries
23 stars 23 forks source link

--interactive switch unexpectedly affects dispatching #71

Open triska opened 7 years ago

triska commented 7 years ago

I am filing this as an HTTP issue because this is where I have found a case that is easy to reproduce, and it is also where this issue affects me most. This may of course be due to a more general underlying issue with file loading.

To reproduce, please download Proloxy, and make sure to run any (arbitrary) web server on port 3031, because that's the port used in the config.pl example configuration of Proloxy.

Then, please run:

$ swipl proloxy.pl config.pl --port=5050 --interactive

And test the dispatch with:

$ wget localhost:5050

This will indirectly (via Proloxy) fetch / from the web server on port 3031, and works completely as expected.

In contrast, if you omit the --interactive switch and use --no-fork instead, the proxy server again starts exactly as expected:

$ swipl proloxy.pl config.pl --port=5050 --no-fork
% Started server at http://localhost:5050/

But we now get:

$  wget localhost:5050
--2016-12-15 01:00:10--  http://localhost:5050/
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:5050... failed: Connection refused.
Connecting to localhost (localhost)|127.0.0.1|:5050... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2016-12-15 01:00:10 ERROR 503: Service Unavailable.

If you telnet to Proloxy or access it differently, you see the exact reason:

request_prefix_target/3: No matching rule for ...

This error stems from Proloxy, apparently because the file config.pl that appears on the command line was not loaded in the same way as if we had used --interactive.

Note that if I turn around the file arguments, i.e.:

$ swipl config.pl proloxy.pl --port=5050 --no-fork

Then everything works as expected again. Intuitively, I would like the same outcome in all of these cases, like in the case of the --interactive switch. In particular, loading the config file before or after proloxy.pl should ideally not make any difference, at least not in such simple cases.

JanWielemaker commented 7 years ago

The problem is now clear. The server is started from :- initialization in proloxy.pl. If the server is interactive, the process is associated to a terminal (we hope) and wait/1 in http_unix_daemon.pl enables the development system and completes, after which loading proloxy.pl is complete and loading config.pl starts.

In the --no-fork option, wait/1 does not return. If it would, it would reach the Prolog toplevel and as there is no terminal attached it will read end-of-file and terminate Prolog.

A solution is to call

$ swipl config.pl proloxy.pl --port=5050 --no-fork

I don't really see anything else. Using command line options (-g goal) using a #! line is dubious as there is only portable support for #!<exe> <arg> (a single argument). The initialization directive is supposed to run after loading the file, but before the load command returns. We cannot change that as it might be that the initialization command is needed to be able to use the just loaded module (which I think how it was intended in the first place).

One option might be :- initialization(Goal, main). or something like that. There is already initialization/2, so -although non-standard- it is not a big step. The main option would imply the goal is started after loading all files and there can only be one such goal. Not sure. In that case it would also make sense to terminate the process if Goal completes, but that would break the ability to enter the interactive toplevel.

triska commented 7 years ago

Thank you for looking into this!

I would like to add the reason why I care about the order of files in this case: In the case of Proloxy, there are directives in proloxy.pl affecting properties of predicates that may be defined in user configurations. The most important one in this case is the :- multifile/1 declaration of request_prefix_target/3.

Of course, multifile/1 is quite versatile in SWI-Prolog in the sense that also files that are loaded later can declare this. For comparison, this is not the case in other Prolog systems, where the multifile/1 directive must occur in the first file where clauses of the predicate appear.

Still, it is natural in such cases to specify proloxy.pl as the first file on the command line, even if only for the reason that users should not have to repeat the multifile/1 declaration in their own configuration files. The exact same issue will also arise in the hooks you suggest in #77: Users will naturally place their HTTP daemon files before their custom configuration files that contain the hooks, and in some cases this may even be required to fully benefit from the predicate declarations that the daemon library itself contains.

So, for this reason, I would like to endorse a way to state "run this predicate after all files have been loaded". For example, what about:

:- main Goal.

If the --interactive switch is given, the daemon could itself escape to the toplevel by calling toplevel/0 or a similar predicate that initiates the toplevel interaction and continues with it. This way, you can also let the program halt as soon as Goal succeeds.

JanWielemaker commented 7 years ago

Of course, multifile/1 is quite versatile in SWI-Prolog in the sense that also files that are loaded later can declare this. For comparison, this is not the case in other Prolog systems, where the multifile/1 directive must occur in the first file where clauses of the predicate appear.

I'm don't know what ISO says about this. AFAIK though most systems demand the directive to be present in each file (before the clauses). SWI-Prolog is indeed more relaxed and only requires the declaration once before clauses from a second file are loaded.

JanWielemaker commented 7 years ago

With Kuniaki also falling into this trap I agree we need something. I'm not yet sure what though. main/1 is most likely in use by too many people. Internally this already exists (forgot the interface). I think Visual Prolog has something to declare the main goal. Wonder whether any other system has that?

An option would be to keep http_daemon as :- initialization, but make this register the alternative toplevel if the session is not interactive. Of course, the disadvantage is that the daemon is started before the other files are loaded :(

This needs some thought ...

triska commented 7 years ago

Of course, the disadvantage is that the daemon is started before the other files are loaded :(

In fact, I would regard this as an advantage in this concrete case. The reason I found this issue altogether was that I spawned a new thread upon initialization of my config.pl, in order to watch and report the worker threads of the daemon.

I was surprised to see 0 threads as the first message, and then realized that I had placed config.pl before proloxy.pl on the command line. So, I swapped the files, and then noticed that the watching thread would no longer start at all due to this issue.

Now, I have reversed the files again, but of course I have a (comparatively harmless) race condition where the watcher may report the number of threads before the Unix daemon has created all of them.

So, in fact, it would be great for me in this case if I could say:

$ swipl proloxy.pl config.pl

and config.pl is only loaded when the daemon is completely running.

It's true, there may be a few milliseconds where the daemon listens but the dispatching rules are not yet defined, but then again this may anyhow be the case.

I'm not saying that this behaviour is particularly desirable in all other cases too, but in the case of the Unix daemon it would make perfect sense to me, and registering an alternative toplevel sounds like a very clean solution.