chrisant996 / clink-gizmos

A library of Lua scripts for use with Clink https://github.com/chrisant996/clink.
MIT License
72 stars 6 forks source link

Parsing args for programs like cargo, choco in auto_argmatcher.lua #11

Closed saumyajyoti closed 3 months ago

saumyajyoti commented 4 months ago

How to parse commands (args without - or /) from help text of programs like cargo etc. ? Similar to pip, choco commands using auto_argmatcher.lua.

chrisant996 commented 4 months ago

auto_argmatcher.lua uses modules\help_parser.lua.

If one of the built-in parsers can read the help text, then you can use one of the built-in parsers.

If not, then you would need to create your own parser function.

See the comments in the source code for details.

saumyajyoti commented 4 months ago

Thanks for quick reply. It seems will need to add additional handling. Will try to explore.

chrisant996 commented 4 months ago

You can create a parser, or you can just create an argmatcher the normal way (how everything except auto_argmatcher.lua does).

saumyajyoti commented 4 months ago

Yes, agree. For short term regular arg matcher I can do. But auto parser looks very powerful as I have to do that for multiple programs I use.

plutonium-239 commented 3 weeks ago

@saumyajyoti Just to gauge interest, did you mean commands such as :

14:06:54|C:\Windows\system32>uv --help
An extremely fast Python package manager.

Usage: uv [OPTIONS] <COMMAND>

Commands:
  run      Run a command or script
  init     Create a new project
  add      Add dependencies to the project
  remove   Remove dependencies from the project
  sync     Update the project's environment
  lock     Update the project's lockfile
  export   Export the project's lockfile to an alternate format
  tree     Display the project's dependency tree
  tool     Run and install commands provided by Python packages
  python   Manage Python versions and installations
  pip      Manage Python packages with a pip-compatible interface
  venv     Create a virtual environment
  build    Build Python packages into source distributions and wheels
  publish  Upload distributions to an index
  cache    Manage uv's cache
  self     Manage the uv executable
  version  Display uv's version
  help     Display documentation for a command
....

which are different from the -- flags?

This is from uv

I have a lot of use cases for such programs, so I might be interested in writing this parser. @chrisant996 Is extending auto_argmatcher a good idea, or should I try and write a new script? (I have very little lua experience - only from scripting a small cyberpunk mod lol)

chrisant996 commented 3 weeks ago

@chrisant996 Is extending auto_argmatcher a good idea, or should I try and write a new script? (I have very little lua experience - only from scripting a small cyberpunk mod lol)

auto_argmatcher is meant for when many programs share the same help text format, so that a single parser can do a good job of accurately parsing the help text from many different programs. But even then you have to say which parser to use.

GNU programs have a high degree of shared format, so that's a good example -- but even GNU programs aren't completely uniform from one program to another.

If you're going to invest a lot of time and energy to make a parser that accurately understands a specific help text format that's shared by a collection of programs, then making a parser for auto_argmatcher could be interesting. But otherwise, it will be much simpler and take much less time and effort to just write a separate argmatcher.

Also, it's worth noting that one drawback of parsing help text is that updates to the program may change the help text in ways that break assumptions in the help text parser. Especially if the program's help text doesn't strictly/consistently follow well-defined formatting rules.

saumyajyoti commented 2 weeks ago

@saumyajyoti Just to gauge interest, did you mean commands such as :

14:06:54|C:\Windows\system32>uv --help
An extremely fast Python package manager.

Usage: uv [OPTIONS] <COMMAND>

Commands:
  run      Run a command or script
  init     Create a new project
  add      Add dependencies to the project
  remove   Remove dependencies from the project
  sync     Update the project's environment
  lock     Update the project's lockfile
  export   Export the project's lockfile to an alternate format
  tree     Display the project's dependency tree
  tool     Run and install commands provided by Python packages
  python   Manage Python versions and installations
  pip      Manage Python packages with a pip-compatible interface
  venv     Create a virtual environment
  build    Build Python packages into source distributions and wheels
  publish  Upload distributions to an index
  cache    Manage uv's cache
  self     Manage the uv executable
  version  Display uv's version
  help     Display documentation for a command
....

which are different from the -- flags?

This is from uv

I have a lot of use cases for such programs, so I might be interested in writing this parser. @chrisant996 Is extending auto_argmatcher a good idea, or should I try and write a new script? (I have very little lua experience - only from scripting a small cyberpunk mod lol)

Hi @plutonium-239 , Thank you. Yes, I meant this kind of usage. cargo is another example for such args.

cargo -h Rust's package manager

Usage: cargo [+toolchain] [OPTIONS] [COMMAND] cargo [+toolchain] [OPTIONS] -Zscript [ARGS]...

Options: -V, --version Print version info and exit --list List installed commands --explain Provide a detailed explanation of a rustc error message -v, --verbose... Use verbose output (-vv very verbose/build.rs output) -q, --quiet Do not print cargo log messages --color Coloring: auto, always, never -C Change to DIRECTORY before doing anything (nightly-only) --locked Assert that Cargo.lock will remain unchanged --offline Run without accessing the network --frozen Equivalent to specifying both --locked and --offline --config Override a configuration value -Z Unstable (nightly-only) flags to Cargo, see 'cargo -Z help' for details -h, --help Print help

Commands: build, b Compile the current package check, c Analyze the current package and report errors, but don't build object files clean Remove the target directory doc, d Build this package's and its dependencies' documentation new Create a new cargo package init Create a new cargo package in an existing directory add Add dependencies to a manifest file remove Remove dependencies from a manifest file run, r Run a binary or example of the local package test, t Run the tests bench Run the benchmarks update Update dependencies listed in Cargo.lock search Search registry for crates publish Package and upload this package to the registry install Install a Rust binary uninstall Uninstall a Rust binary ... See all commands with --list

See 'cargo help ' for more information on a specific command.

saumyajyoti commented 1 week ago
================================================================================= cargo help 
Rust's package manager

Usage: cargo [+toolchain] [OPTIONS] [COMMAND]
       cargo [+toolchain] [OPTIONS] -Zscript <MANIFEST_RS> [ARGS]...

Options:
  -V, --version             Print version info and exit
      --list                List installed commands
      --explain <CODE>      Provide a detailed explanation of a rustc error message
  -v, --verbose...          Use verbose output (-vv very verbose/build.rs output)
  -q, --quiet               Do not print cargo log messages
      --color <WHEN>        Coloring: auto, always, never
  -C <DIRECTORY>            Change to DIRECTORY before doing anything (nightly-only)
      --locked              Assert that `Cargo.lock` will remain unchanged
      --offline             Run without accessing the network
      --frozen              Equivalent to specifying both --locked and --offline
      --config <KEY=VALUE>  Override a configuration value
  -Z <FLAG>                 Unstable (nightly-only) flags to Cargo, see 'cargo -Z help' for details
  -h, --help                Print help

Commands:
    build, b    Compile the current package
    check, c    Analyze the current package and report errors, but don't build object files
    clean       Remove the target directory
    doc, d      Build this package's and its dependencies' documentation
    new         Create a new cargo package
    init        Create a new cargo package in an existing directory
    add         Add dependencies to a manifest file
    remove      Remove dependencies from a manifest file
    run, r      Run a binary or example of the local package
    test, t     Run the tests
    bench       Run the benchmarks
    update      Update dependencies listed in Cargo.lock
    search      Search registry for crates
    publish     Package and upload this package to the registry
    install     Install a Rust binary
    uninstall   Uninstall a Rust binary
    ...         See all commands with --list

See 'cargo help <command>' for more information on a specific command.

================================================================================= rustup help

rustup 1.27.1 (54dd3d00f 2024-04-24)

The Rust toolchain installer

Usage: rustup [OPTIONS] [+toolchain] [COMMAND]

Commands:
  show         Show the active and installed toolchains or profiles
  update       Update Rust toolchains and rustup
  check        Check for updates to Rust toolchains and rustup
  default      Set the default toolchain
  toolchain    Modify or query the installed toolchains
  target       Modify a toolchain's supported targets
  component    Modify a toolchain's installed components
  override     Modify toolchain overrides for directories
  run          Run a command with an environment configured for a given toolchain
  which        Display which binary will be run for a given command
  doc          Open the documentation for the current toolchain
  self         Modify the rustup installation
  set          Alter rustup settings
  completions  Generate tab-completion scripts for your shell
  help         Print this message or the help of the given subcommand(s)

Arguments:
  [+toolchain]  release channel (e.g. +stable) or custom toolchain to set override

Options:
  -v, --verbose  Enable verbose output
  -q, --quiet    Disable progress output
  -h, --help     Print help
  -V, --version  Print version

Discussion:
    Rustup installs The Rust Programming Language from the official
    release channels, enabling you to easily switch between stable,
    beta, and nightly compilers and keep them updated. It makes
    cross-compiling simpler with binary builds of the standard library
    for common platforms.

    If you are new to Rust consider running `rustup doc --book` to
    learn Rust.

    =================================================================================  go help
Go is a tool for managing Go source code.

Usage:

    go <command> [arguments]

The commands are:

    bug         start a bug report
    build       compile packages and dependencies
    clean       remove object files and cached files
    doc         show documentation for package or symbol
    env         print Go environment information
    fix         update packages to use new APIs
    fmt         gofmt (reformat) package sources
    generate    generate Go files by processing source
    get         add dependencies to current module and install them
    install     compile and install packages and dependencies
    list        list packages or modules
    mod         module maintenance
    work        workspace maintenance
    run         compile and run Go program
    telemetry   manage telemetry data and settings
    test        test packages
    tool        run specified go tool
    version     print Go version
    vet         report likely mistakes in packages

Use "go help <command>" for more information about a command.

Additional help topics:

    buildconstraint build constraints
    buildmode       build modes
    c               calling between Go and C
    cache           build and test caching
    environment     environment variables
    filetype        file types
    go.mod          the go.mod file
    gopath          GOPATH environment variable
    goproxy         module proxy protocol
    importpath      import path syntax
    modules         modules, module versions, and more
    module-auth     module authentication using go.sum
    packages        package lists and patterns
    private         configuration for downloading non-public code
    testflag        testing flags
    testfunc        testing functions
    vcs             controlling version control with GOVCS

Use "go help <topic>" for more information about that topic.

================================================================================= uv help 
An extremely fast Python package manager.

Usage: uv [OPTIONS] <COMMAND>

Commands:
  run                        Run a command or script
  init                       Create a new project
  add                        Add dependencies to the project
  remove                     Remove dependencies from the project
  sync                       Update the project's environment
  lock                       Update the project's lockfile
  export                     Export the project's lockfile to an alternate format
  tree                       Display the project's dependency tree
  tool                       Run and install commands provided by Python packages
  python                     Manage Python versions and installations
  pip                        Manage Python packages with a pip-compatible interface
  venv                       Create a virtual environment
  build                      Build Python packages into source distributions and wheels
  publish                    Upload distributions to an index
  cache                      Manage uv's cache
  self                       Manage the uv executable
  version                    Display uv's version
  generate-shell-completion  Generate shell completion
  help                       Display documentation for a command

Cache options:
  -n, --no-cache               Avoid reading from or writing to the cache, instead using a temporary directory for the duration of the operation [env: UV_NO_CACHE=]
      --cache-dir <CACHE_DIR>  Path to the cache directory [env: UV_CACHE_DIR=]

Python options:
      --python-preference <PYTHON_PREFERENCE>  Whether to prefer uv-managed or system Python installations [env: UV_PYTHON_PREFERENCE=] [possible values: only-managed, managed, system, only-system]
      --no-python-downloads                    Disable automatic downloads of Python. [env: "UV_PYTHON_DOWNLOADS=never"]

Global options:
  -q, --quiet                                      Do not print any output
  -v, --verbose...                                 Use verbose output
      --color <COLOR_CHOICE>                       Control colors in output [default: auto] [possible values: auto, always, never]
      --native-tls                                 Whether to load TLS certificates from the platform's native certificate store [env: UV_NATIVE_TLS=]
      --offline                                    Disable network access
      --allow-insecure-host <ALLOW_INSECURE_HOST>  Allow insecure connections to a host [env: UV_INSECURE_HOST=]
      --no-progress                                Hide all progress outputs [env: UV_NO_PROGRESS=]
      --directory <DIRECTORY>                      Change to the given directory prior to running the command
      --project <PROJECT>                          Run the command within the given project directory
      --config-file <CONFIG_FILE>                  The path to a `uv.toml` file to use for configuration [env: UV_CONFIG_FILE=]
      --no-config                                  Avoid discovering configuration files (`pyproject.toml`, `uv.toml`) [env: UV_NO_CONFIG=]
  -h, --help                                       Display the concise help for this command
  -V, --version                                    Display the uv version

Use `uv help <command>` for more information on a specific command.

================================================================================= wezterm help 
Wez's Terminal Emulator
http://github.com/wez/wezterm

Usage: wezterm.exe [OPTIONS] [COMMAND]

Commands:
  start                  Start the GUI, optionally running an alternative program [aliases: -e]
  ssh                    Establish an ssh session
  serial                 Open a serial port
  connect                Connect to wezterm multiplexer
  ls-fonts               Display information about fonts
  show-keys              Show key assignments
  cli                    Interact with experimental mux server
  imgcat                 Output an image to the terminal
  set-working-directory  Advise the terminal of the current working directory by emitting an OSC 7 escape sequence
  record                 Record a terminal session as an asciicast
  replay                 Replay an asciicast terminal session
  shell-completion       Generate shell completion information
  help                   Print this message or the help of the given subcommand(s)

Options:
  -n, --skip-config                Skip loading wezterm.lua
      --config-file <CONFIG_FILE>  Specify the configuration file to use, overrides the normal configuration file resolution
      --config <name=value>        Override specific configuration values
  -h, --help                       Print help
  -V, --version                    Print version
  =================================================================================
plutonium-239 commented 1 week ago

It seems like there is some level of consistency: 'Commands' with keyword commands and '(Additional) Options' with keyword arguments. go doesn't have the same headings, but the format is the same. However it also includes "Additional help topics" which might confuse the parser, so it might need to be done separately, which is fine.

I don't know why rustup has a separate 'Arguments' section that only has one option and starts with a +. Seems like it should've been a normal --toolchain={} argument. But oh well, it's already been live for a while now.

plutonium-239 commented 1 week ago

Another example for a non-rust/go program is 7z:

13:05:40|C:\Windows\system32>7z --help

7-Zip [64] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04

Usage: 7z <command> [<switches>...] <archive_name> [<file_names>...]
       [<@listfiles...>]

<Commands>
  a : Add files to archive
  b : Benchmark
  d : Delete files from archive
  e : Extract files from archive (without using directory names)
  h : Calculate hash values for files
  i : Show information about supported formats
  l : List contents of archive
  rn : Rename files in archive
  t : Test integrity of archive
  u : Update files to archive
  x : eXtract files with full paths

<Switches>
  -- : Stop switches parsing
  -ai[r[-|0]]{@listfile|!wildcard} : Include archives
  -ax[r[-|0]]{@listfile|!wildcard} : eXclude archives
  -ao{a|s|t|u} : set Overwrite mode
  -an : disable archive_name field
  -bb[0-3] : set output log level
  -bd : disable progress indicator
  -bs{o|e|p}{0|1|2} : set output stream for output/error/progress line
  -bt : show execution time statistics
  -i[r[-|0]]{@listfile|!wildcard} : Include filenames
  -m{Parameters} : set compression Method
    -mmt[N] : set number of CPU threads
  -o{Directory} : set Output directory
  -p{Password} : set Password
  -r[-|0] : Recurse subdirectories
  -sa{a|e|s} : set Archive name mode
  -scc{UTF-8|WIN|DOS} : set charset for for console input/output
  -scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files
  -scrc[CRC32|CRC64|SHA1|SHA256|*] : set hash function for x, e, h commands
  -sdel : delete files after compression
  -seml[.] : send archive by email
  -sfx[{name}] : Create SFX archive
  -si[{name}] : read data from stdin
  -slp : set Large Pages mode
  -slt : show technical information for l (List) command
  -snh : store hard links as links
  -snl : store symbolic links as links
  -sni : store NT security information
  -sns[-] : store NTFS alternate streams
  -so : write data to stdout
  -spd : disable wildcard matching for file names
  -spe : eliminate duplication of root folder for extract command
  -spf : use fully qualified file paths
  -ssc[-] : set sensitive case mode
  -ssw : compress shared files
  -stl : set archive timestamp from the most recently modified file
  -stm{HexMask} : set CPU thread affinity mask (hexadecimal number)
  -stx{Type} : exclude archive type
  -t{Type} : Set type of archive
  -u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options
  -v{Size}[b|k|m|g] : Create volumes
  -w[{path}] : assign Work directory. Empty path means a temporary directory
  -x[r[-|0]]{@listfile|!wildcard} : eXclude filenames
  -y : assume Yes on all queries
chrisant996 commented 1 week ago

It seems like there is some level of consistency: 'Commands' with keyword commands and '(Additional) Options' with keyword arguments.

It sounds like you plan to make a parser that only works for English. That can be fine for your own use, but is problematic if the parser would be shared publicly. It'll be less functional than a manually written argmatcher.

Versus when an argmatcher is written directly (not parsed), then it might be in only one language (maybe English), but it'll still technically work on any language computer. But if the user doesn't know the argmatcher's language then they'll find it difficult to use.

plutonium-239 commented 6 days ago

I totally missed that, you're right.

I was just referring to the sections here though; the parser would probably rely on if it can find blocks of text with "alignments" at 2(/more) levels, or another heuristic that we can come up with. Probably will take inspiration directly from the written arg_matchers - since most of the time the difference in commands and arguments is just the --.

Do the existing parsers also work on a similar principle and are language-agnostic?

chrisant996 commented 6 days ago

Do the existing parsers also work on a similar principle and are language-agnostic?

I've written many different help text parsers in the past few years that I've been maintaining and extending Clink. Here's what I've found:

  • GNU programs seem to be the closest to having some kind of somewhat universal help text format. But it's not formalized, and different text authors use slightly different conventions in different programs (or even different descriptions in the same program).
  • Some GNU programs seem to be in English only, which seems to reduce the need for dealing with localized text in certain programs.
  • There are tons of different help text formatting conventions and quirks. A while after initially writing auto_argmatcher, I reached the conclusion that writing automatic parsers that work for more than 1 or 2 programs seems like a bit of a pipe dream.
chrisant996 commented 6 days ago

Btw, the fish shell on Linux claims to parse man pages to automatically create completions. That's what started me down the path of exploring writing auto_argmatcher -- if fish can do it for man pages, maybe a script could do it for help text, or at least help text from certain families like the GNU family.

But Windows doesn't have anything like man or man pages. And even fish has lots of cases where its man page parser gets confused and produces wrong completions. And man pages have contextual clues that plain help text lacks (such as style tags), so man page parsing is significantly easier than trying to parse plain help text from arbitrary programs with very different help text formatting conventions.

In practice, my efforts for generalized help text parsing on Windows haven't gone well. Too much inconsistency between programs (and even between versions of the same program).

I'm not going to invest any more time into exploring the possibility of a generalized help text parser. I've found that it's an unrealistic/unreliable path. I've abandoned making any further "built in" functionality for auto_argmatcher. I now write argmatchers manually, or sometimes I write a custom parser for a specific program (if it has localized text, or if it has a lot of flags/values/commands AND has a highly structured and highly consistent help text format).

Of course anyone else is welcome to explore working on generalized parsers. Maybe someone else can come up with something semi-decent, with enough investment of time and effort and testing (and computing power and maybe some kind of specialized LLM). It depends on how many bugs and how much maintenance one is willing to invest in.

plutonium-239 commented 6 days ago

Yeah, that seems fair, it can get pretty pointless having an auto_argmatcher which has doesn't auto work in most cases. But there needs to be some sort of solution, manually writing parsers is ofc not feasible in the long term.

I had thought about the LLM possibility, and even tried to get multiple to generate a clink parser for tar with varying levels of success. I would love to undertake the training of a specialized SLM for this use case but I don't think we would have enough data to train it on - although synthetic could be generated.

chrisant996 commented 6 days ago

But there needs to be some sort of solution, manually writing parsers is ofc not feasible in the long term.

Completion scripts are normally written manually for all shells, even fish. But fish has a stopgap mechanism of parsing man pages to get some mostly/partially-functional completions for things that don't have manually completion scripts but do have man pages.

I don't see how the "need" could be fulfilled, and none of the other shells have automatic solutions.

Zooming out, the real issue is that Clink/CMD is not widely used enough for application authors to provide (or maintain) completion scripts for Clink.

Which is why I shifted to exploring a fish complete.lua script to parse fish completion scripts, at least straightforward ones that don't try to run shell scripts. But that has lots of limitations, and fish is an exotic shell with a niche following, so fish completion scripts are often not available anyway.

Automated parsing of arbitrary help text and arbitrary command line interfaces is a bit unrealistic. If completions are the most important thing for someone, then it'd be best to switch away from using CMD.