Open luke21923 opened 2 years ago
Actually, Chez Scheme already supports this via process
or open-process-ports
. Internally this uses fork
on unix-like operating system and CreateProcessW
on windows (see the code for s_process
in the c
subdirectory for reference).
The only big difference is that these functions provide I/O for stdin, stdout, and (in the case of open-process-ports
) stderr, as well as the pid.
You can see: https://cisco.github.io/ChezScheme/csug9.5/foreign.html#./foreign:s5 for more information.
You could write something like system*
as:
(define (system* cmd . args)
(apply (lambda (from-stdout to-stdin pid)
(close-output-port to-stdin)
(display (get-string-all from-stdout))
pid)
(process (format "~s~{ ~s~}" cmd args))))
This would display the output to stdout and return the process id (similar to my reading of the GNU Guile documentation you linked):
> (system* "ls" "-a" "-l")
total 177608
drwxr-xr-x@ 25 akeep admin 800 May 3 2021 .
drwxr-xr-x@ 15 akeep admin 480 May 3 2021 ..
-rwxr-xr-x@ 1 akeep admin 683 May 3 2021 drracket
-rwxr-xr-x@ 1 akeep admin 763 May 3 2021 gracket
-rwxr-xr-x@ 1 akeep admin 776 May 3 2021 gracket-text
-rwxr-xr-x@ 1 akeep admin 740 May 3 2021 mred
-rwxr-xr-x@ 1 akeep admin 776 May 3 2021 mred-text
-rwxr-xr-x@ 1 akeep admin 707 May 3 2021 mzc
-rwxr-xr-x@ 1 akeep admin 715 May 3 2021 mzpp
-rwxr-xr-x@ 1 akeep admin 45432096 May 3 2021 mzscheme
-rwxr-xr-x@ 1 akeep admin 717 May 3 2021 mztext
-rwxr-xr-x@ 1 akeep admin 720 May 3 2021 pdf-slatex
-rwxr-xr-x@ 1 akeep admin 685 May 3 2021 plt-games
-rwxr-xr-x@ 1 akeep admin 699 May 3 2021 plt-help
-rwxr-xr-x@ 1 akeep admin 702 May 3 2021 plt-r5rs
-rwxr-xr-x@ 1 akeep admin 702 May 3 2021 plt-r6rs
-rwxr-xr-x@ 1 akeep admin 709 May 3 2021 plt-web-server
-rwxr-xr-x@ 1 akeep admin 45415504 May 3 2021 racket
-rwxr-xr-x@ 1 akeep admin 707 May 3 2021 racket-documentation
-rwxr-xr-x@ 1 akeep admin 684 May 3 2021 raco
-rwxr-xr-x@ 1 akeep admin 706 May 3 2021 scribble
-rwxr-xr-x@ 1 akeep admin 704 May 3 2021 setup-plt
-rwxr-xr-x@ 1 akeep admin 716 May 3 2021 slatex
-rwxr-xr-x@ 1 akeep admin 685 May 3 2021 slideshow
-rwxr-xr-x@ 1 akeep admin 697 May 3 2021 swindle
51325
(Yes, I called system*
in the Racket v8.1 bin directory---I happened to test out system*
there to make sure I understood what it did.)
Also, note the use of format
in the call to process
to put the command and arguments together.
process
still invokes a shell (/bin/sh
, specifically), and the example quotes but does not escape string arguments.
Writing a function to properly escape the arguments is not exactly complicated, but it is extra work that wouldn't have to happen if a more direct interface to fork/exec were provided.
Ugh. You are correct, my apologies, I missed the /bin/sh
in the execl
call. The quoting was initially accidental, but I realized I had done that, and it actually allows for arguments that have spaces in them, which I liked, so I left it that way :)
Thank you for trying. On Debian, when I try akeep's (system*)
function, it works as expected for this call (there is no shell substitution):
> (system* "echo" "*")
*
2026
But unfortunately there is a shell substitution for this call:
> (system* "echo" "$$")
2029
2029
I think I could get by with the current (system)
function, if I only invoke gcc
on simple alphanumeric filenames. But it would not be a very robust design.
The idea was to use Chez Scheme as a cross-platform Shell. And the basic task of a Shell is to execute programs in subprocesses (create subprocess, pass arguments, set up stdin/stdout/stderr, wait for subprocess termination, collect exit value).
This would display the output to stdout and return the process id (similar to my reading of the GNU Guile documentation you linked)
Guile's function displays the output to stdout, and returns the process' exit value (typically zero when everything went well). It would be a good idea to return the subprocess id if we were executing it asynchronously. But we are executing it synchronously (we wait for its termination), so its exit value is a more relevant information to return.
It would be more useful to store the output in a log file, though. The MIT Scheme implementation is better at that, because it provides optional arguments allowing us to redirect the standard output of the subprocess to a specific port.
A drawback of using MIT's interface is the length of the name (run-synchronous-subprocess
is 26 characters long). However, this long name has the advantage of being crystal clear.
Concerning the handling of the $PATH environment variable, I think Racket has a neat solution: just ignore it, and provide another function to locate an executable with the help of the $PATH environment variable. Doing it this way allows us to bypass $PATH completely if we want to.
I wonder if scsh still works.
It would be nice if there was a way to execute processes without starting a shell. You don't want to send any untrusted data via the shell. Sending it as args to one program just requires that that program can't be subverted with garbage input. Say you have a web service to convert jpg to png, by running magick <filename.jpg> <filename>.png
. If it shells out, a user could supply a file named ; rm -rf / .jpg
and bad things would happen. If it doesn't shell out, imagemagick will just convert the file (provided imagemagick can't be subverted via odd file names of course).
There are indeed a bunch of security issues with the Bourne Shell.
I read the paper describing the design principles behind scsh. That project is interesting, but it turns out that scsh
targets specifically Unix, so it is a non-starter for me (Windows support is not optional).
scsh
has the ability to connect the standard output of one subprocess to the standard input of another subprocess (this is an unnamed pipe). I don't know if it is possible or desirable to achieve this with Scheme ports. But that is an issue regarding the port module, I guess it is independent of a potential inclusion of an MIT style run-synchronous-subprocess
function into Chez Scheme.
Racket)) (there is no way to save the output to a log file):
(system* "/bin/ls" "-a" "-l")
FYI, Racket arranges for stdout and stderr to be attached to (current-output-port)
and (current-error-port)
, respectively, e.g.:
#lang racket
(require rackunit)
(check-equal?
(with-output-to-string
(λ ()
(system* "/usr/bin/echo" "$$")))
"$$\n")
So you can use with-output-to-file
or whatever to write to a file. Lower-level functions like process*/ports
provide even more options.
I believe Guile supports something similar, though I'm less familiar with the details.
You might also be interested in Will Hash's Rash: The Reckless Racket Shell. (There's also a GCPE paper.)
A source of inspiration would be Perl's system
(or exec
) command:
"If there are no shell metacharacters in the argument [of system
], it is split into words and passed directly to execvp
, which is more efficient."
See perldoc -f system
or https://perldoc.perl.org/functions/system.
Magically changing the behavior based on scanning the string and assuming we know what /bin/sh
would do with it seems like an even worse idea.
It's not efficiency that is why I would like a shell-free alternative, it's because starting a shell makes it very hard to secure. A fallback to execv
if a shell is not needed doesn't help that use case.
I did a proof of concept for this with plain scheme to see how it would work, and you can absolutely just do something like this (ignore the newline I forgot to add in the display
, oops):
execvp.sls
;; -*- mode: scheme; coding: utf-8 -*-
;; Copyright (c) 2022
;; SPDX-License-Identifier: MIT
#!r6rs
(library (execvp)
(export call
call-output-to-file)
(import (chezscheme))
(define (dup fd)
((foreign-procedure "dup" (int) int) fd))
(define (dup2 fd1 fd2)
((foreign-procedure "dup2" (int int) int) fd1 fd2))
(define (execvp prog args)
((foreign-procedure "execvp" (string void*) int) prog args))
(define (string->cstring str)
(let* ([bv (string->bytevector str (native-transcoder))]
[len (bytevector-length bv)]
[buf (foreign-alloc (* (+ 1 len) (foreign-sizeof 'unsigned-8)))])
(let loop ((idx 0))
(cond
((>= idx len) '())
(#t (begin
(foreign-set!
'unsigned-8 buf
(* idx (foreign-sizeof 'unsigned-8))
(bytevector-u8-ref bv idx))
(loop (+ idx 1))))))
(foreign-set!
'unsigned-8 buf
(* len (foreign-sizeof 'unsigned-8))
0)
buf))
(define (string*->arg* str*)
(let* ([len (length str*)]
[buf (foreign-alloc (* (+ 1 len) (foreign-sizeof 'void*)))])
(let loop ([idx 0]
[str* str*])
(cond
((null? str*) '())
(#t (begin
(foreign-set!
'void* buf
(* idx (foreign-sizeof 'void*))
(string->cstring (car str*)))
(loop (+ idx 1) (cdr str*))))))
(foreign-set!
'void* buf
(* len (foreign-sizeof 'void*))
0)
buf))
(define (call prog str*)
(if (= 0 ((foreign-procedure "fork" () int)))
(execvp prog (string*->arg* (cons prog str*)))
(begin
((foreign-procedure "wait" (void*) int) 0)
(void))))
(define (call-output-to-file file prog str*)
(if (= 0 ((foreign-procedure "fork" () int)))
(call-with-output-file file
(lambda (port)
(dup2 (port-file-descriptor port) 1)
(dup2 (port-file-descriptor port) 2)
(execvp prog (string*->arg* (cons prog str*)))))
(begin
((foreign-procedure "wait" (void*) int) 0)
(void))))
(load-shared-object #f))
I wouldn't be surprised if there's a better way to write the foreign bit, but it's just a PoC.
On Windows you would want to use _spawnvp
over fork
+execvp
, I imagine, although I prototyped this on Linux and don't have a Windows machine in easy reach -- sorry! Looks like to replicate the behavior I have here you would just do a synchronous _spawnvp
-- you would need to use _dup
and _dup2
to cache and restore stdout and stderr, as opposed to just using dup2 to set it, though. Something like: (untested)
(define (dup fd)
((foreign-procedure "_dup" (int) int) fd))
(define (dup2 fd1 fd2)
((foreign-procedure "_dup2" (int int) int) fd1 fd2))
(define (spawnvp mode prog args)
((foreign-procedure "_spawnvp" (int string void*) int) mode prog args))
(define (call prog str*)
(execvp prog (string*->arg* (cons prog str*)))
(void))
(define (call-output-to-file file prog str*)
(call-with-output-file file
(lambda (port)
(let ([stdout (dup 1)]
[stderr (dup 2)])
(dup2 (port-file-descriptor port) 1)
(dup2 (port-file-descriptor port) 2)
(spawnvp 0 prog (string*->arg* (cons prog str*)))
(dup2 stdout 1)
(dup2 stderr 2)
(void)))))
fork
+execvp
should work on OS X just fine as well though, although I haven't tested it.
If you wanted to process the output of the command I'm not sure exactly what I'd recommend -- dup
and dup2
only work with file descriptors and only file ports have those -- well, and stdin, stdout, stderr. You could perhaps create a pipe (linux, osx, windows) from the process's stdout to your stdin (making sure you've consumed all stdin first), then read it all on return (or if implementing async, which you should probably do, admittedly, read it while the process is running).
That said you can't really distinguish between stdout and stderr if doing that. memfd_create
would work but it doesn't exist on Windows or MacOS, sadly, and anything more complex I would honestly want to work with from C and expose to Scheme, not work with entirely from Scheme.
I suspect outputting to a file and just reading it in is what you'd want to do anyway -- I expect most build systems do something similar so that the build logs are available.
Hello,
I have seen some software projects using Python as a scripting language for the build process (instead of the Bourne Shell). I like the idea, because it makes it easier to build on Windows.
I would like to use Chez Scheme this way, because it is already a dependency for my project. With Chez Scheme, I can call
gcc
with the system function, but it is not ideal, because it goes through the Shell, and substitutions might occur, depending on the default Shell interpreter (bash or ksh or csh or cmd.exe/PowerShell or ...).If you think it is a good idea to have a function that executes a file without involving the Shell, and you consider adding it to Chez Scheme, you might want to take a look at how it is done in other Schemes first:
GNU Guile (there is no way to save the output to a log file):
Racket)) (there is no way to save the output to a log file):
MIT Scheme:
And Python 3 does it this way:
On Unix, this feature probably requires the use of
fork()
andexecv()
instead ofsystem()
.Thanks for this great Scheme implementation!