Both the parsers in `vast_utils.py` are broken.

fake-name commented 3 months ago

I'm trying to figure out how the parse_env() function works. So far, it seems like it doesn't work for anything:

from vastai_client.vast_utils import parse_env
env_strs = [
"'-e INTERRO_USER=mlbox_2'",
"'-e ASR_MODEL=base'",
]

for env_str in env_strs:
    ret = parse_env(env_str)

    print("Env: %s -> '%s'" % (env_str, ret))

returns:

Env: '-e INTERRO_USER=mlbox_2' -> '{}'
Env: '-e ASR_MODEL=base' -> '{}'

The ASR_MODEL example string is literally from the docstring of the function. Nothing I do seems to ever cause it to return anything but an empty dict.

fake-name commented 3 months ago

Ok, so there are a BUNCH of completely valid environment variable options that are broken.

Leading spaces (completely valid) break the parsing.
Multiple spaces between any arguments break the parsing.
Surrounding the string in quotes (like the vast.ai documentation and the function docstring say you should use) breaks the parsing.
Dots in environment variables (One of my env variables is a URL) breaks the parsing.
Quoting the value for your environment variable breaks the parsing.
Spaces in the env variables (which are completely valid if quoted) break the parsing.

It seems like the parser thingie here was kind of ad-hoc for a single use case, and it completely falls over if you do anything else.

fake-name commented 3 months ago

Here is a rewritten parse_env() that should handle most valid things you can pass as environment variables:

import shlex
import itertools

def parse_env(envs: str | None) -> dict[str, str]:
    """Parse the environment variables provided to the docker container.

    Args:
        envs (str | None): String with environment variables: "'-e ASR_MODEL=base'".

    Returns
    -------
        dict[str, str]: The parsed environment variables.
    """

    result: dict[str, str] = {}
    if envs is None:
        return result

    # Handle wrapping quotes because the docs suck
    if (envs.strip().startswith("'") and envs.strip().endswith("'")) or \
        (envs.strip().startswith('"') and envs.strip().endswith('"')):
        envs = shlex.split(envs)

        assert len(envs) == 1, "Quoted envs argument did not parse into a single item after doing initial quote unwrap"
        envs = envs[0]

    split = shlex.split(envs)

    for first, second in itertools.pairwise(split):
        if first == '-e':

            name, value = second.split('=', 1)
            result[name] = value

        if first == '-p':
            if set(second).issubset(set("0123456789:")):
                result["-p " + second] = "1"

    return result

It mainly fixes things by virtue of delegating all the shell command line parsing complexity to shlex, which is part of the stdlib and exists for literally this exact issue.

fake-name commented 3 months ago

Ok, so parse_query() is ALSO broken:


query_strs = [
    "num_gpus=1 gpu_name in [RTX_6000Ada, A40] gpu_ram>40 rentable=true verified=false",
]

for env_str in query_strs:
    ret = parse_query(env_str)

    print("Query: '%s' -> '%s'" % (env_str, ret))

python3 env_wat.py
Traceback (most recent call last):
  File "/media/Scripts/book_cat/env_wat.py", line 319, in <module>
    ret = parse_query(env_str)
          ^^^^^^^^^^^^^^^^^^^^
  File "/media/Scripts/book_cat/env_wat.py", line 286, in parse_query
    v[op_name] = value.replace('_', ' ')
                 ^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'replace'

It appears that the in [list] syntax just causes it to barf and fail.

This query string works fine for the vastai command line interface. Looking at the regexes in parse_query() I think there was some intention to support the in syntax, but in any event it doesn't work.

Edit: No, it's just broken period. num_gpus=1 gpu_name=RTX_6000Ada gpu_ram>40 rentable=true verified=false ALSO fails to parse.

Barahlush / vastai_client

Both the parsers in `vast_utils.py` are broken. #43