PuellaeMagicae / unix-in-lisp

Mount Unix system into Common Lisp image
100 stars 4 forks source link

-- toc-org-max-depth: 3; --

+TITLE: Unix in Lisp - mount Unix into Common Lisp image

Unix in Lisp is currently a usable Lisp (not POSIX!) shell for Unix. The distinctive feature of Unix in Lisp is that rather than creating a sub-language for Unix operations, Unix concepts are directly/shallowly embedded into Lisp (Unix commands become Lisp macros, Unix file become Lisp variables, Unix streams become lazy Lisp sequences, etc).

The fact that Unix in Lisp /is/ Lisp, rather than an interpreter written in Lisp, makes it possible to leverage existing tools for Lisp. One instance is the Unix in SLIME listener, which inherits completion, interactive debugger and multiple listeners etc from SLIME itself. Another instance is that existing CL library such as sequence helper functions from ~serapeum~ works out of the box on Unix in Lisp process streams.

It's recommended to load ~unix-in-slime.el~ for better SLIME integration. To load it, evaluate ~(require 'unix-in-slime "~/quicklisp/local-projects/unix-in-lisp/unix-in-slime")~ in emacs. You may want to add this line to your ~init.el~. Then ~M-x unix-in-sime~ to start a listener, and have fun!

~unix-in-slime~ installs hacks to the host Lisp environment by calling ~(unix-in-lisp:install)~ on startup. To undo hacks done to the host environment and unmount Unix FS packages, run ~(unix-in-lisp:uninstall)~.

** Examples (Some print-outs are omitted)

Counting number of files

+begin_src

/Users/kchan> (cd quicklisp/local-projects/unix-in-lisp) /Users/kchan/quicklisp/local-projects/unix-in-lisp> (pipe (ls) (wc -l)) 9

+end_src

But why not the Lisp way as well!

+begin_src

/Users/kchan/quicklisp/local-projects/unix-in-lisp> (length (package-exports ./)) 9

+end_src

For more examples, see [[file:TUTORIAL.org]]

The exported symbols of a Unix FS package should one-to-one correspond to files in the mapped directory. Exceptions to this one-to-one correspondence:

Each of these exported symbols has a global symbol macro binding, so that they can be read/write like Lisp variables. Access to the symbol gives the list of lines of the underlying file, and setting it with a list designator of lines cause them to be written to the file.

+begin_src

unix-user /Users/kchan/> .bashrc ("export CC=\"clang\"" "export PS1='$(hostname):$(pwd) $(whoami)\$ '" ...)

+end_src

Note that there are no corresponding symbol for a non-existent file. To write or create a file that you are not sure whether it already exists, it's recommended to use ~defile~ macro, which will ensure the file exists and creates the corresponding symbol.

+begin_src

unix-user /Users/kchan/> (defile iota.txt (iota 10)) /Users/kchan/iota.txt unix-user /Users/kchan/> iota.txt ("0" "1" "2" "3" "4" "5" "6" "7" "8" "9")

+end_src

In the above example, if ~iota.txt~ does not exist and I use ~setq~ instead of ~defile~, an internal symbol named ~IOTA.TXT~ will be created in ~UNIX-USER~ package instead and I will write to its value cell, rather than ~/Users/kchan/iota.txt~ on the file system.

** Command and Process Mapping Unix in Lisp manages jobs in the unit of /Effective processes/. Theses include regular Unix processes represented by ~simple-process~, and ~pipeline~'s which are consisted of any number of UNIX processes and Lisp function stages. *** Simple commands When Unix in Lisp maps a directory, files are checked for execution permission and executable ones are mapped as Common Lisp macros. These macros /implicitly quasiquotes/ their arguments. The arguments are converted to strings using ~literal-to-string~, then passed to the corresponding executable.

Examples of using macros mapped from Unix commands

+begin_src

/Users/kchan/some-documents> (cat ,@(ls)) ;; This cats together all files under current directory.

+end_src

You can also set up redirections (and maybe other process creation settings in the future) via supplying keyword arguments. These arguments /are not/ implicitly quasiquoted and /are/ evaluated.

+begin_src

/Users/kchan/some-documents> (ls :output terminal-io) ;; This outputs to terminal-io, which usually goes into inferior-lisp buffer.

+end_src

+begin_src

/Users/kchan/some-documents> (ls :error :output) ;; This redirect stderr of ls command to its stdout, like 2>&1 in posix shell

+end_src

Like you have discovered in ~(cat ,@(ls))~, effective processes can be used like Lisp sequences -- they designate the sequence of their output lines. *** Pipeline

Pipelines are created via the ~pipe~ macro:

+begin_src

/Users/kchan/quicklisp/local-projects/unix-in-lisp> (pipe (wc -l) (ls)) 9

+end_src

Under the hood, except the first stage, each stage of the pipeline is passed ~:input ~ as an additional argument. Alternatively, if there are arguments ~_~, they are substituted with the result of the previous stage. You can mix Lisp functions and values with Unix commands. Using Lisp value as the first input stage is easy enough:

+begin_src

/Users/kchan> (pipe (iota 10) (wc)) 10 10 20

+end_src

The ~_~ extension make it easy to add Lisp functions to the mix:

+begin_src

/Users/kchan> (pipe (ls) (filter (lambda (s) (> (length s) 10)) _) (wc -l)) 47

+end_src

The above counts the number of file with filename longer than 10 under my home directory. *** Interactive Use Inside a ~unix-in-slime~ listener, if the primary value of evaluation is an effective process and it has avaliable input/output streams, ~unix-in-slime~ automatically "connect" it to the listener, i.e. I/O of the listener is redirected to the process, similar to /foreground processes/ in POSIX shell:

+begin_src

/Users/kchan> (python3 -i) Python 3.8.9 (default, Apr 13 2022, 08:48:07) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin Type "help", "copyright", "credits" or "license" for more information.

print("Hello world!") Hello world! ; No values /Users/kchan>

+end_src

Attention: use ~C-u RET~ to signal EOF in ~unix-in-slime~, similar to ~Ctrl+D~ in POSIX shells. You can interrupt evaluation via ~C-c C-c~ like usual, after which you will be provided a few restarts:

  1. ~BACKGROUND~ puts the job in background (accessible via ~unix-in-lisp:jobs~)
  2. ~ABORT~ terminates the current job (via ~SIGTERM~ for Unix processes)

Attention: You have to use ~-i~ flag to start Python REPL, because Unix in Lisp currently talk to all processes using pipe rather than pseudo tty. Without ~-i~, Python will start itself into non-interactive mode. Other REPLs may need respective flags.

When using Unix in Lisp outside ~unix-in-slime~, use ~(unix-in-lisp:repl-connect )~ to achieve the same thing.

~unix-in-lisp:jobs~ keeps a list of running effective processes:

+begin_src

unix-in-lisp> jobs (#<simple-process python3 (running) {1005BFFCF3}>)

+end_src

Note that because ~unix-in-slime~ listener connects a job automatically if it is the primary value of evaluation, you can use e.g.

+begin_src

unix-in-lisp> (nth 0 jobs)

+end_src

to resume from a background job.

~unix-in-lisp:repl-connect~ connects a process exclusively in at most one listener. If a process is already connected in other listener, it will do nothing and the effective process object will be printed like normal. In fact, many Unix in Lisp operations (including ~repl-connect~ and ~pipe~) takes exclusive access of input/output stream of processes (by setting the respective slots to ~nil~ during their course of operation). *** More about job control, seq, && and || Different from POSIX shell, Unix in Lisp by default run all command asynchronously, or "in the background". This is a very important difference to keep in mind!

The macro ~(fg . forms)~ run each of the ~forms~ in foreground via ~repl-connect~, just like how Unix in SLIME listener would run them.

~seq,&&,||~ roughly correspond to ~;,&&,||~ in POSIX shell, except that they are all asynchronous. They return effective processes that aggregate the input/output of their sub-commands. When you use Lisp code inside these operators, be aware that they run in a /different/ thread rather than the caller thread! The rule of thumb is that if you want something like ~progn~ for both regular Lisp and Unix in Lisp commands, use ~fg~ instead of ~seq~. Use ~seq,&&,||~ for composing effective processes only. ** Environment Unix environment variables are mapped to special (dynamic-scope) Lisp variables.

+begin_src

/Users/kchan> $logname "kchan"

+end_src

You can set them or dynamically bind them

+begin_src

/Users/kchan> (setf $test "42") "42" /Users/kchan> (pipe '("echo $TEST") (bash)) 42 nil /Users/kchan> (let (($test "override")) (pipe '("echo $TEST") (bash))) override nil

+end_src

The above works with the help of a reader macro defined on ~$~, which registers the following symbol as an environment variable. If you want to use Unix in Lisp environment variables without our readtable, you need to use function ~unix-in-lisp:ensure-env-var~ to register the symbol first. Consult its docstring for more information.

Unix in Lisp keeps its own idea of a Unix environment, and pass to subprocesses created by it (e.g. via the macros it created from Unix commands). Other Lisp facilities (e.g. ~uiop:run-program~) does not know this, and usually inherit the "real" Unix environment of the Lisp process instead. To remedy this, Unix in Lisp provides function ~unix-in-lisp:synchronize-env-to-unix~ which copies the environment Unix in Lisp manages to the "real" Unix environment of the Lisp process. This is by default run in ~post-command-hook~, and you may want to call them before using other Lisp facilities that spawns Unix subprocesses. ** Scripting (blazingly fast start up) The recommended way to write scripts is to create executable files (say ~do-stuff.sh~) with contents like

+begin_src

!/usr/env/bin sbcl --script

(asdf:require-system "") (asdf:require-system "unix-in-lisp") (unix-in-lisp:setup)

#+end_src The benefit of the above approach is that it is blazingly fast when started from within Unix in Lisp (via e.g. ~(do-stuff.sh)~), because Unix in Lisp has a /Fast loading command/ mechanism, which can execute the script within Unix in Lisp image without starting subprocess if it detects a Lisp shebang. The essence of writing fast startup script is: 1. Use ~#!/usr/env/bin sbcl --script~ shebang. Currently it has to be an exact match. 2. Use ~asdf:require-system~. This avoids scanning the ASDF registry directory tree for modification, which wastes significant time! On my machine, a hello world using the above approach run in 0.5ms, while Python 3 uses 30ms! ** Unix in SLIME The above documentations have been assuming you are using the ~unix-in-slime~ listener. Here we document some additional aspects of ~unix-in-slime~. Unix in Lisp assumes a dedicated swank server for ~unix-in-slime~ listeners (and potentially other front-ends in the future). ~M-x unix-in-slime~ will start one on ~unix-in-slime-default-port~ (4010 by default) if none already exists in the Unix in Lisp image. The server handles multiple connections, so you can safely start multiple ~unix-in-slime~ listeners simultaneously, like how you must have lived with multiple terminal windows. *** Performance: fast! A quite unintended achievement is that ~unix-in-slime~ is a very fast shell for Emacs. In fact, a simple ~(pipe "time for i in {0..99999}; do echo line $i; done" (sh))~ benchmark takes 0.83s in ~unix-in-slime~, and takes 2.93s in ~vterm~. ~unix-in-slime~ is more than 3 times faster than one of the fastest Emacs terminal emulator (partly written in C)! Of course, this is not a head-to-head comparison because ~vterm~ is a terminal emulator while ~unix-in-slime~ is a shell, but I did frequent experience fast command outputs choking Emacs and it's good to know ~unix-in-slime~ is pretty good at handle these. I think the reason is that SLIME's swank server does some very Emacs-specific tuning, e.g. limiting network packet rate because it knows Emacs choke on a flood of them, which also benefits us when we use it as a shell. *** Completion If you have configured completion for SLIME, completion works out of the box for ~unix-in-slime~. Note that we automagically get "filename completion", because they are mapped as symbols, and we have symbol completion at home! Currently there's one quirk: filenames are always completed to their fully resolved path (with ~.. . ~~ components resolved), because that's what corresponds to symbols. I'd say it's either a bug or a feature depending on who you ask, I'm leaving it like that for now. ** Package system structure Unix in Lisp defines and populates a number of packages during ~unix-in-lisp:install~. First, ~unix-in-lisp:path~ is created according to ~$PATH~ environment variable. Then, ~unix-in-lisp.common~ is ensured to re-export ~unix-in-lisp.path~, and also export symbols corresponding to environment variables. Packages that wish to make use of Unix in Lisp functionalities should use ~unix-in-lisp.common~, and potentially shadowing import some of its symbols. Any other usage of packages created by Unix in Lisp is less safe, including using or importing symbols from the Unix FS packages, particularly because invoking ~unix-in-lisp:uninstall~ deletes them. The Unix in SLIME listener by default starts in ~unix-user~ package, which uses ~unix-in-lisp.common~ and other utility packages. This causes all listeners to share the same package by default, but you can also create new packages and switch listeners to it. Note that we /do not/ support current directory by /using/ its corresponding Unix FS package. Instead, a reader hook (to ~sb-impl::%intern~) is installed that replace symbols denoting relative path with a new "effective" uninterned symbol that merges bindings from the original symbol and the mounted symbols according to the relative path under current directory (~*default-pathname-defaults*~). Similar to Unix, our redirection never shadows existing global function bindings, to avoid unintentionally execute files under current directory.