Open arcanis opened 11 months ago
One strategy would be to tweak the core so that the state machine starts small, and progressively expand as we find new tokens we don't know how to support.
Let's say we have commands set version <arg>
, set version from sources
, and install
. Let's imagine that, instead of the fully CLI-aware state machine we currently provide to runMachine
, we instead provide an empty state machine. The runMachine
function would accept a "failsafe callback"; this callback would take the current state machine, a stream of token, and return one of three values: ABORT
, FEED
, or another state machine.
We'll now parse the following CLI input:
["set", "version", "from", "sources"]
Things would go like this:
set
is consumed. No possible states are found.["set"]
as token stream.<root>/commands/set/index.js
.<root>/commands/set
) would exist, but not the index.js
file. FEED
would be returned. ["set", "version"]
as token stream.<root>/commands/set/version/index.js
.index.js
. A new state machine would be returned, and runMachine
would merge it with the existing one.set
is re-consumed. A matching state is found.version
is consumed. A matching state is found.from
is consumed. A matching state is found (the <arg>
from set version <arg>
) but since it's an argument the failsafe activate nonetheless, this time with ["set", "version", "from"]
.<root>/commands/set/version/from/index.js
.<root>/commands/set
) would exist, but not the index.js
file. FEED
would be returned.
ABORT
would have been returned, and runMachine
would have simply accepted from
as being <arg>
without more objection. ["set", "version", "from", "sources"]
as token stream.<root>/commands/set/version/from/sources/index.js
.index.js
. A new state machine would be returned, and runMachine
would merge it with the existing one.from
is re-consumed. Two matching states are found.version
is consumed. A matching state is found, and the <arg>
alternative is abandoned.This approach allowed us to avoid having to make other calls than 4 filesystem calls. It however has a couple of thorny aspects:
For this to work, we need a way to lazily evaluate the command files (the index.js
files). This isn't a problem in CJS-land, we have require()
. However in ESM, we only have import()
, which is asynchronous. That requires to turn runMachine
into an async function (breaking change).
Options can be specified before a path. For example, if set version from sources
has a --path
option, then the user may call it via --path=foo set version from sources
. It means that if an option is there, they need to be skipped for the purpose of the failsafe function.
Worse, it's also possible to write --path foo set version from sources
. Since we don't have the state machine, we don't know that the foo
token is the value associated to the --path
option (for all we know, there could be a foo set version from sources
command with a --path
boolean option). Even worse, since options may have any numbers of arguments (tuples), it could be a from sources
with a --path foo set version
option!
I wonder how it would interact with #89 (command completion, cc @paul-soporan). In the worst case we can disable the laziness for the purpose of the command completion, but it'd be interesting to find a way to merge them together at some point.
To solve that, if we detect options tokens first, we need to follow an annoying dance. If we assume --path foo set version from sources
, then the engine will need to call the failsafe callback on each of ["foo"]
/ ["set"]
/ ["version"]
/ ["from"]
/ ["sources"]
, and extend the token stream for each alternative as long as the callback returns either of FEED
or a state machine (only ABORT
should stop the alternative from being explored). Fortunately, ABORT
will be the main result, so in practice only a single alternative will be crawled.
In practice, doing this will require:
runMachine
asynchronous.enum FailsafeResult { Feed, Abort }
enum.runMachine
option bag ((tokens: string[]) => FailsafeResult | StateMachine
).mergeStateMachines
function (with tests). Perhaps makeAnyOfMachine
is actually enough?
When writing large CLI application, we find ourselves in a pickle. Let's say we have commands similar to:
The
something
andsomethingElse
functions aren't needed untilMyCommand
is executed, but since they are in a top-level import the generated code will still import them before even evaluating the command file. At the scale of a large application, those imports start to slow down the startup by a significant factor. We can mitigate it a little by doing something like this:But that's really verbose, and that's not even what people doing things like this do (they instead just call
import
multiple times in a row, like top-level imports, except that it prevents the runtime from fetching / parsing the modules in parallel, making sync something that could be parallelized).A second problem is that even if the imports are moved into
execute
, just running files has a cost. They need to be read, parsed, evaluated, and all that when they don't actually contribute to anything at all for the purpose of the command parsing. This problem is exacerbated when using transpilers, as the cost can easily reach hundreds of ms for larger CLIs.The first point can be solved by the Deferring Module Evaluation proposal, but it's currently still at stage 2 (cc @nicolo-ribaudo in case you're interested by this thread / practical use case), and even with that we'd still have the problem of the files being executed at all (probably not as much a problem if you don't use a transpiler).
Ideally, I'd like to find a way to solve both points.