Closed saumyajyoti closed 3 months ago
auto_argmatcher.lua uses modules\help_parser.lua.
If one of the built-in parsers can read the help text, then you can use one of the built-in parsers.
If not, then you would need to create your own parser function.
See the comments in the source code for details.
Thanks for quick reply. It seems will need to add additional handling. Will try to explore.
You can create a parser, or you can just create an argmatcher the normal way (how everything except auto_argmatcher.lua does).
Yes, agree. For short term regular arg matcher I can do. But auto parser looks very powerful as I have to do that for multiple programs I use.
@saumyajyoti Just to gauge interest, did you mean commands such as :
14:06:54|C:\Windows\system32>uv --help
An extremely fast Python package manager.
Usage: uv [OPTIONS] <COMMAND>
Commands:
run Run a command or script
init Create a new project
add Add dependencies to the project
remove Remove dependencies from the project
sync Update the project's environment
lock Update the project's lockfile
export Export the project's lockfile to an alternate format
tree Display the project's dependency tree
tool Run and install commands provided by Python packages
python Manage Python versions and installations
pip Manage Python packages with a pip-compatible interface
venv Create a virtual environment
build Build Python packages into source distributions and wheels
publish Upload distributions to an index
cache Manage uv's cache
self Manage the uv executable
version Display uv's version
help Display documentation for a command
....
which are different from the --
flags?
This is from uv
I have a lot of use cases for such programs, so I might be interested in writing this parser.
@chrisant996 Is extending auto_argmatcher
a good idea, or should I try and write a new script? (I have very little lua experience - only from scripting a small cyberpunk mod lol)
@chrisant996 Is extending
auto_argmatcher
a good idea, or should I try and write a new script? (I have very little lua experience - only from scripting a small cyberpunk mod lol)
auto_argmatcher
is meant for when many programs share the same help text format, so that a single parser can do a good job of accurately parsing the help text from many different programs. But even then you have to say which parser to use.
GNU programs have a high degree of shared format, so that's a good example -- but even GNU programs aren't completely uniform from one program to another.
If you're going to invest a lot of time and energy to make a parser that accurately understands a specific help text format that's shared by a collection of programs, then making a parser for auto_argmatcher
could be interesting. But otherwise, it will be much simpler and take much less time and effort to just write a separate argmatcher.
Also, it's worth noting that one drawback of parsing help text is that updates to the program may change the help text in ways that break assumptions in the help text parser. Especially if the program's help text doesn't strictly/consistently follow well-defined formatting rules.
@saumyajyoti Just to gauge interest, did you mean commands such as :
14:06:54|C:\Windows\system32>uv --help An extremely fast Python package manager. Usage: uv [OPTIONS] <COMMAND> Commands: run Run a command or script init Create a new project add Add dependencies to the project remove Remove dependencies from the project sync Update the project's environment lock Update the project's lockfile export Export the project's lockfile to an alternate format tree Display the project's dependency tree tool Run and install commands provided by Python packages python Manage Python versions and installations pip Manage Python packages with a pip-compatible interface venv Create a virtual environment build Build Python packages into source distributions and wheels publish Upload distributions to an index cache Manage uv's cache self Manage the uv executable version Display uv's version help Display documentation for a command ....
which are different from the
--
flags?This is from uv
I have a lot of use cases for such programs, so I might be interested in writing this parser. @chrisant996 Is extending
auto_argmatcher
a good idea, or should I try and write a new script? (I have very little lua experience - only from scripting a small cyberpunk mod lol)
Hi @plutonium-239 , Thank you. Yes, I meant this kind of usage. cargo is another example for such args.
cargo -h Rust's package manager
Usage: cargo [+toolchain] [OPTIONS] [COMMAND]
cargo [+toolchain] [OPTIONS] -Zscript
Options:
-V, --version Print version info and exit
--list List installed commands
--explain Provide a detailed explanation of a rustc error message
-v, --verbose... Use verbose output (-vv very verbose/build.rs output)
-q, --quiet Do not print cargo log messages
--color
Cargo.lock
will remain unchanged
--offline Run without accessing the network
--frozen Equivalent to specifying both --locked and --offline
--config
Commands: build, b Compile the current package check, c Analyze the current package and report errors, but don't build object files clean Remove the target directory doc, d Build this package's and its dependencies' documentation new Create a new cargo package init Create a new cargo package in an existing directory add Add dependencies to a manifest file remove Remove dependencies from a manifest file run, r Run a binary or example of the local package test, t Run the tests bench Run the benchmarks update Update dependencies listed in Cargo.lock search Search registry for crates publish Package and upload this package to the registry install Install a Rust binary uninstall Uninstall a Rust binary ... See all commands with --list
See 'cargo help
================================================================================= cargo help
Rust's package manager
Usage: cargo [+toolchain] [OPTIONS] [COMMAND]
cargo [+toolchain] [OPTIONS] -Zscript <MANIFEST_RS> [ARGS]...
Options:
-V, --version Print version info and exit
--list List installed commands
--explain <CODE> Provide a detailed explanation of a rustc error message
-v, --verbose... Use verbose output (-vv very verbose/build.rs output)
-q, --quiet Do not print cargo log messages
--color <WHEN> Coloring: auto, always, never
-C <DIRECTORY> Change to DIRECTORY before doing anything (nightly-only)
--locked Assert that `Cargo.lock` will remain unchanged
--offline Run without accessing the network
--frozen Equivalent to specifying both --locked and --offline
--config <KEY=VALUE> Override a configuration value
-Z <FLAG> Unstable (nightly-only) flags to Cargo, see 'cargo -Z help' for details
-h, --help Print help
Commands:
build, b Compile the current package
check, c Analyze the current package and report errors, but don't build object files
clean Remove the target directory
doc, d Build this package's and its dependencies' documentation
new Create a new cargo package
init Create a new cargo package in an existing directory
add Add dependencies to a manifest file
remove Remove dependencies from a manifest file
run, r Run a binary or example of the local package
test, t Run the tests
bench Run the benchmarks
update Update dependencies listed in Cargo.lock
search Search registry for crates
publish Package and upload this package to the registry
install Install a Rust binary
uninstall Uninstall a Rust binary
... See all commands with --list
See 'cargo help <command>' for more information on a specific command.
================================================================================= rustup help
rustup 1.27.1 (54dd3d00f 2024-04-24)
The Rust toolchain installer
Usage: rustup [OPTIONS] [+toolchain] [COMMAND]
Commands:
show Show the active and installed toolchains or profiles
update Update Rust toolchains and rustup
check Check for updates to Rust toolchains and rustup
default Set the default toolchain
toolchain Modify or query the installed toolchains
target Modify a toolchain's supported targets
component Modify a toolchain's installed components
override Modify toolchain overrides for directories
run Run a command with an environment configured for a given toolchain
which Display which binary will be run for a given command
doc Open the documentation for the current toolchain
self Modify the rustup installation
set Alter rustup settings
completions Generate tab-completion scripts for your shell
help Print this message or the help of the given subcommand(s)
Arguments:
[+toolchain] release channel (e.g. +stable) or custom toolchain to set override
Options:
-v, --verbose Enable verbose output
-q, --quiet Disable progress output
-h, --help Print help
-V, --version Print version
Discussion:
Rustup installs The Rust Programming Language from the official
release channels, enabling you to easily switch between stable,
beta, and nightly compilers and keep them updated. It makes
cross-compiling simpler with binary builds of the standard library
for common platforms.
If you are new to Rust consider running `rustup doc --book` to
learn Rust.
================================================================================= go help
Go is a tool for managing Go source code.
Usage:
go <command> [arguments]
The commands are:
bug start a bug report
build compile packages and dependencies
clean remove object files and cached files
doc show documentation for package or symbol
env print Go environment information
fix update packages to use new APIs
fmt gofmt (reformat) package sources
generate generate Go files by processing source
get add dependencies to current module and install them
install compile and install packages and dependencies
list list packages or modules
mod module maintenance
work workspace maintenance
run compile and run Go program
telemetry manage telemetry data and settings
test test packages
tool run specified go tool
version print Go version
vet report likely mistakes in packages
Use "go help <command>" for more information about a command.
Additional help topics:
buildconstraint build constraints
buildmode build modes
c calling between Go and C
cache build and test caching
environment environment variables
filetype file types
go.mod the go.mod file
gopath GOPATH environment variable
goproxy module proxy protocol
importpath import path syntax
modules modules, module versions, and more
module-auth module authentication using go.sum
packages package lists and patterns
private configuration for downloading non-public code
testflag testing flags
testfunc testing functions
vcs controlling version control with GOVCS
Use "go help <topic>" for more information about that topic.
================================================================================= uv help
An extremely fast Python package manager.
Usage: uv [OPTIONS] <COMMAND>
Commands:
run Run a command or script
init Create a new project
add Add dependencies to the project
remove Remove dependencies from the project
sync Update the project's environment
lock Update the project's lockfile
export Export the project's lockfile to an alternate format
tree Display the project's dependency tree
tool Run and install commands provided by Python packages
python Manage Python versions and installations
pip Manage Python packages with a pip-compatible interface
venv Create a virtual environment
build Build Python packages into source distributions and wheels
publish Upload distributions to an index
cache Manage uv's cache
self Manage the uv executable
version Display uv's version
generate-shell-completion Generate shell completion
help Display documentation for a command
Cache options:
-n, --no-cache Avoid reading from or writing to the cache, instead using a temporary directory for the duration of the operation [env: UV_NO_CACHE=]
--cache-dir <CACHE_DIR> Path to the cache directory [env: UV_CACHE_DIR=]
Python options:
--python-preference <PYTHON_PREFERENCE> Whether to prefer uv-managed or system Python installations [env: UV_PYTHON_PREFERENCE=] [possible values: only-managed, managed, system, only-system]
--no-python-downloads Disable automatic downloads of Python. [env: "UV_PYTHON_DOWNLOADS=never"]
Global options:
-q, --quiet Do not print any output
-v, --verbose... Use verbose output
--color <COLOR_CHOICE> Control colors in output [default: auto] [possible values: auto, always, never]
--native-tls Whether to load TLS certificates from the platform's native certificate store [env: UV_NATIVE_TLS=]
--offline Disable network access
--allow-insecure-host <ALLOW_INSECURE_HOST> Allow insecure connections to a host [env: UV_INSECURE_HOST=]
--no-progress Hide all progress outputs [env: UV_NO_PROGRESS=]
--directory <DIRECTORY> Change to the given directory prior to running the command
--project <PROJECT> Run the command within the given project directory
--config-file <CONFIG_FILE> The path to a `uv.toml` file to use for configuration [env: UV_CONFIG_FILE=]
--no-config Avoid discovering configuration files (`pyproject.toml`, `uv.toml`) [env: UV_NO_CONFIG=]
-h, --help Display the concise help for this command
-V, --version Display the uv version
Use `uv help <command>` for more information on a specific command.
================================================================================= wezterm help
Wez's Terminal Emulator
http://github.com/wez/wezterm
Usage: wezterm.exe [OPTIONS] [COMMAND]
Commands:
start Start the GUI, optionally running an alternative program [aliases: -e]
ssh Establish an ssh session
serial Open a serial port
connect Connect to wezterm multiplexer
ls-fonts Display information about fonts
show-keys Show key assignments
cli Interact with experimental mux server
imgcat Output an image to the terminal
set-working-directory Advise the terminal of the current working directory by emitting an OSC 7 escape sequence
record Record a terminal session as an asciicast
replay Replay an asciicast terminal session
shell-completion Generate shell completion information
help Print this message or the help of the given subcommand(s)
Options:
-n, --skip-config Skip loading wezterm.lua
--config-file <CONFIG_FILE> Specify the configuration file to use, overrides the normal configuration file resolution
--config <name=value> Override specific configuration values
-h, --help Print help
-V, --version Print version
=================================================================================
It seems like there is some level of consistency: 'Commands' with keyword commands and '(Additional) Options' with keyword arguments.
go
doesn't have the same headings, but the format is the same. However it also includes "Additional help topics" which might confuse the parser, so it might need to be done separately, which is fine.
I don't know why rustup
has a separate 'Arguments' section that only has one option and starts with a +
. Seems like it should've been a normal --toolchain={}
argument. But oh well, it's already been live for a while now.
Another example for a non-rust/go program is 7z
:
13:05:40|C:\Windows\system32>7z --help
7-Zip [64] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04
Usage: 7z <command> [<switches>...] <archive_name> [<file_names>...]
[<@listfiles...>]
<Commands>
a : Add files to archive
b : Benchmark
d : Delete files from archive
e : Extract files from archive (without using directory names)
h : Calculate hash values for files
i : Show information about supported formats
l : List contents of archive
rn : Rename files in archive
t : Test integrity of archive
u : Update files to archive
x : eXtract files with full paths
<Switches>
-- : Stop switches parsing
-ai[r[-|0]]{@listfile|!wildcard} : Include archives
-ax[r[-|0]]{@listfile|!wildcard} : eXclude archives
-ao{a|s|t|u} : set Overwrite mode
-an : disable archive_name field
-bb[0-3] : set output log level
-bd : disable progress indicator
-bs{o|e|p}{0|1|2} : set output stream for output/error/progress line
-bt : show execution time statistics
-i[r[-|0]]{@listfile|!wildcard} : Include filenames
-m{Parameters} : set compression Method
-mmt[N] : set number of CPU threads
-o{Directory} : set Output directory
-p{Password} : set Password
-r[-|0] : Recurse subdirectories
-sa{a|e|s} : set Archive name mode
-scc{UTF-8|WIN|DOS} : set charset for for console input/output
-scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files
-scrc[CRC32|CRC64|SHA1|SHA256|*] : set hash function for x, e, h commands
-sdel : delete files after compression
-seml[.] : send archive by email
-sfx[{name}] : Create SFX archive
-si[{name}] : read data from stdin
-slp : set Large Pages mode
-slt : show technical information for l (List) command
-snh : store hard links as links
-snl : store symbolic links as links
-sni : store NT security information
-sns[-] : store NTFS alternate streams
-so : write data to stdout
-spd : disable wildcard matching for file names
-spe : eliminate duplication of root folder for extract command
-spf : use fully qualified file paths
-ssc[-] : set sensitive case mode
-ssw : compress shared files
-stl : set archive timestamp from the most recently modified file
-stm{HexMask} : set CPU thread affinity mask (hexadecimal number)
-stx{Type} : exclude archive type
-t{Type} : Set type of archive
-u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options
-v{Size}[b|k|m|g] : Create volumes
-w[{path}] : assign Work directory. Empty path means a temporary directory
-x[r[-|0]]{@listfile|!wildcard} : eXclude filenames
-y : assume Yes on all queries
It seems like there is some level of consistency: 'Commands' with keyword commands and '(Additional) Options' with keyword arguments.
It sounds like you plan to make a parser that only works for English. That can be fine for your own use, but is problematic if the parser would be shared publicly. It'll be less functional than a manually written argmatcher.
Versus when an argmatcher is written directly (not parsed), then it might be in only one language (maybe English), but it'll still technically work on any language computer. But if the user doesn't know the argmatcher's language then they'll find it difficult to use.
I totally missed that, you're right.
I was just referring to the sections here though; the parser would probably rely on if it can find blocks of text with "alignments" at 2(/more) levels, or another heuristic that we can come up with. Probably will take inspiration directly from the written arg_matchers - since most of the time the difference in commands and arguments is just the --
.
Do the existing parsers also work on a similar principle and are language-agnostic?
Do the existing parsers also work on a similar principle and are language-agnostic?
I've written many different help text parsers in the past few years that I've been maintaining and extending Clink. Here's what I've found:
auto_argmatcher
, I reached the conclusion that writing automatic parsers that work for more than 1 or 2 programs seems like a bit of a pipe dream.Btw, the fish
shell on Linux claims to parse man pages to automatically create completions. That's what started me down the path of exploring writing auto_argmatcher
-- if fish
can do it for man pages, maybe a script could do it for help text, or at least help text from certain families like the GNU family.
But Windows doesn't have anything like man
or man pages. And even fish
has lots of cases where its man page parser gets confused and produces wrong completions. And man pages have contextual clues that plain help text lacks (such as style tags), so man page parsing is significantly easier than trying to parse plain help text from arbitrary programs with very different help text formatting conventions.
In practice, my efforts for generalized help text parsing on Windows haven't gone well. Too much inconsistency between programs (and even between versions of the same program).
I'm not going to invest any more time into exploring the possibility of a generalized help text parser. I've found that it's an unrealistic/unreliable path. I've abandoned making any further "built in" functionality for auto_argmatcher
. I now write argmatchers manually, or sometimes I write a custom parser for a specific program (if it has localized text, or if it has a lot of flags/values/commands AND has a highly structured and highly consistent help text format).
Of course anyone else is welcome to explore working on generalized parsers. Maybe someone else can come up with something semi-decent, with enough investment of time and effort and testing (and computing power and maybe some kind of specialized LLM). It depends on how many bugs and how much maintenance one is willing to invest in.
Yeah, that seems fair, it can get pretty pointless having an auto_argmatcher
which has doesn't auto
work in most cases.
But there needs to be some sort of solution, manually writing parsers is ofc not feasible in the long term.
I had thought about the LLM possibility, and even tried to get multiple to generate a clink parser for tar
with varying levels of success.
I would love to undertake the training of a specialized SLM for this use case but I don't think we would have enough data to train it on - although synthetic could be generated.
But there needs to be some sort of solution, manually writing parsers is ofc not feasible in the long term.
Completion scripts are normally written manually for all shells, even fish. But fish has a stopgap mechanism of parsing man pages to get some mostly/partially-functional completions for things that don't have manually completion scripts but do have man pages.
I don't see how the "need" could be fulfilled, and none of the other shells have automatic solutions.
Zooming out, the real issue is that Clink/CMD is not widely used enough for application authors to provide (or maintain) completion scripts for Clink.
Which is why I shifted to exploring a fish complete.lua script to parse fish completion scripts, at least straightforward ones that don't try to run shell scripts. But that has lots of limitations, and fish is an exotic shell with a niche following, so fish completion scripts are often not available anyway.
Automated parsing of arbitrary help text and arbitrary command line interfaces is a bit unrealistic. If completions are the most important thing for someone, then it'd be best to switch away from using CMD.
How to parse commands (args without - or /) from help text of programs like cargo etc. ? Similar to pip, choco commands using auto_argmatcher.lua.