Better error messages - Githubissues

brentyi commented 1 year ago

This PR takes an initial step in the direction of #62. I expect there's more work to be done; I'd also like to add tests before merging. @vwxyzjn if you have any thoughts or suggestions I'd also be interested!

(1) We previously relied on argparse for most of the error messages reported by tyro. We continue to do this, but the formatting now better emphasize the actual error.

Before:

After:

(2) When an invalid value is passed in, we show the argument's documentation + helptext.

Before:

After:

(3) When an argument that doesn't exist is passed in, we now highlight similar arguments.

Before:

After:

(4) Finally, when subcommands are used, similar arguments are coupled with the subcommands that they can be found in:

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.16% :tada:

Comparison is base (5340498) 98.96% compared to head (4e71903) 99.12%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #63 +/- ## ========================================== + Coverage 98.96% 99.12% +0.16% ========================================== Files 23 23 Lines 1829 1953 +124 ========================================== + Hits 1810 1936 +126 + Misses 19 17 -2 ``` | [Flag](https://app.codecov.io/gh/brentyi/tyro/pull/63/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/brentyi/tyro/pull/63/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi) | `99.12% <100.00%> (+0.16%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files Changed](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi) | Coverage Δ | | |---|---|---| | [tyro/\_fields.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fZmllbGRzLnB5) | `99.66% <ø> (ø)` | | | [tyro/\_parsers.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fcGFyc2Vycy5weQ==) | `98.66% <ø> (ø)` | | | [tyro/\_argparse\_formatter.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fYXJncGFyc2VfZm9ybWF0dGVyLnB5) | `97.02% <100.00%> (+2.78%)` | :arrow_up: | | [tyro/\_arguments.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fYXJndW1lbnRzLnB5) | `100.00% <100.00%> (ø)` | | | [tyro/\_calling.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fY2FsbGluZy5weQ==) | `100.00% <100.00%> (ø)` | | | [tyro/\_cli.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fY2xpLnB5) | `100.00% <100.00%> (ø)` | | | [tyro/\_instantiators.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9faW5zdGFudGlhdG9ycy5weQ==) | `98.72% <100.00%> (+<0.01%)` | :arrow_up: | | [tyro/\_strings.py](https://app.codecov.io/gh/brentyi/tyro/pull/63?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Brent+Yi#diff-dHlyby9fc3RyaW5ncy5weQ==) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

brentyi commented 1 year ago

(5)

Improvements when subcommands are present.

For example, when we pass in an argument in the wrong location:

# Incorrect
python 03_multiple_subcommands.py dataset:mnist --dataset.binary True optimizer:sgd

# Correct
python 03_multiple_subcommands.py dataset:mnist optimizer:sgd --dataset.binary True

The old error would be completely irrelevant:

Now:

This has consequences for a lot of things. For example, in nerfstudio --data is a valid argument but --dataset is not.

Previously, if you ran ns-train nerfacto --dataset /some_path the error would be super cryptic:

Now, we get:

vwxyzjn commented 1 year ago

Love it! Maybe in the error message, also give out the docs for the Similar arguments:, so the user would not necessarily need to do python my.py --help.

vwxyzjn commented 1 year ago

Maybe another suggestion is that the Parsing error should appear before the usage section, since the Parsing error is more informative and directly helpful.

brentyi commented 1 year ago

Makes sense! I added the help messages, so as an example if you write --model.features-per-level instead of --pipeline.model.features-per-level you now get:

On the ordering, do you feel strongly about this? To me it's slightly more intuitive to have the most relevant (error) message on the bottom, since it reduces the likelihood that the user needs to scroll in order to see it. Kind of like a stack trace; it's also closer to the original argparse error format.

In any case I'm going to let this PR stew for a few days before merging + releasing; I'd like to spend more time thinking about edge cases + test cases.

vwxyzjn commented 1 year ago

Nice! Did you push the related changes? I am still getting the same result.

ip install git+https://github.com/brentyi/tyro.git@brent/better_errors
Collecting git+https://github.com/brentyi/tyro.git@brent/better_errors
  Cloning https://github.com/brentyi/tyro.git (to revision brent/better_errors) to /tmp/pip-req-build-a_nma7k_
  Running command git clone --filter=blob:none --quiet https://github.com/brentyi/tyro.git /tmp/pip-req-build-a_nma7k_
  Running command git checkout -b brent/better_errors --track origin/brent/better_errors
  Switched to a new branch 'brent/better_errors'
  Branch 'brent/better_errors' set up to track remote branch 'brent/better_errors' from 'origin'.
  Resolved https://github.com/brentyi/tyro.git to commit ef6e733d301c8f8cd0609d986cab59e4012905c5

brentyi commented 1 year ago

Does it work with your original flag, --rewar.flag? The currently similarity metric is just based on difflib and fairly naive, so --reward.track and --track aren't considered similar enough. I can add some heuristics for catching cases like this before merging.

vwxyzjn commented 1 year ago

Btw really good work! I am loving the changes.

On the ordering, do you feel strongly about this? To me it's slightly more intuitive to have the most relevant (error) message on the bottom, since it reduces the likelihood that the user needs to scroll in order to see it. Kind of like a stack trace; it's also closer to the original argparse error format.

I do like the fact that the error shows up in the bottom, but at the same time most of the messages do not feel useful. For example, I have the following with the latest PR. It's not until the end that I see the most useful message.

Maybe as an alternative you can say "for usage info please run python xx.py --help` and just show the parse error message?

 python lm_human_preference_details/train_both_accelerate.py --track
2023-08-30 17:10:35.136557: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
usage: train_both_accelerate.py [-h] [--exp-name STR] [--seed INT]
                                [--reward.exp-name STR]
                                [--reward.seed INT]
                                [--reward.track | --reward.no-track]
                                [--reward.wandb-project-name STR]
                                [--reward.wandb-entity {None}|STR]
                                [--reward.cuda | --reward.no-cuda]
                                [--reward.base-model STR]
                                [--reward.label-dataset STR]
                                [--reward.local-batch-size INT]
                                [--reward.gradient-accumulation-steps INT]
                                [--reward.lr FLOAT] [--reward.eps FLOAT]
                                [--reward.rollout-batch-size INT]
                                [--reward.local-normalize-samples INT]
                                [--reward.debug-normalize INT]
                                [--reward.normalize-before | 
--reward.no-normalize-before]
                                [--reward.normalize-after | 
--reward.no-normalize-after]
                                [--reward.print-sample-output-freq INT]
                                [--reward.save-path STR]
                                [--reward.use-tensorflow-adam | 
--reward.no-use-tensorflow-adam]
                                [--reward.task.query-length INT]
                                [--reward.task.query-dataset STR]
                                [--reward.task.query-prefix STR]
                                [--reward.task.query-suffix STR]
                                [--reward.task.start-text {None}|STR]
                                [--reward.task.end-text {None}|STR]
                                [--reward.task.response-length INT]
                                [--reward.task.temperature FLOAT]
                                [--reward.labels.type STR]
                                [--reward.labels.num-train INT]
                                [--reward.labels.num-labels INT]
                                [--reward.labels.source STR]
                                [--policy.exp-name STR]
                                [--policy.seed INT]
                                [--policy.track | --policy.no-track]
                                [--policy.wandb-project-name STR]
                                [--policy.wandb-entity {None}|STR]
                                [--policy.cuda | --policy.no-cuda]
                                [--policy.base-model STR]
                                [--policy.print-sample-output-freq INT]
                                [--policy.save-path STR]
                                [--policy.use-tensorflow-adam | 
--policy.no-use-tensorflow-adam]
                                [--policy.task.query-length INT]
                                [--policy.task.query-dataset STR]
                                [--policy.task.query-prefix STR]
                                [--policy.task.query-suffix STR]
                                [--policy.task.start-text {None}|STR]
                                [--policy.task.end-text {None}|STR]
                                [--policy.task.response-length INT]
                                [--policy.task.truncate-token INT]
                                [--policy.task.truncate-after INT]
                                [--policy.task.penalty-reward-value INT]
                                [--policy.task.temperature FLOAT]
                                [--policy.rewards.kl-coef FLOAT]
                                [--policy.rewards.trained-model 
{None}|STR]
                                [--policy.ppo.total-episodes INT]
                                [--policy.ppo.local-batch-size INT]
                                [--policy.ppo.gradient-accumulation-steps 
INT]
                                [--policy.ppo.nminibatches INT]
                                [--policy.ppo.noptepochs INT]
                                [--policy.ppo.lr FLOAT]
                                [--policy.ppo.eps FLOAT]
                                [--policy.ppo.vf-coef FLOAT]
                                [--policy.ppo.cliprange FLOAT]
                                [--policy.ppo.cliprange-value FLOAT]
                                [--policy.ppo.gamma FLOAT]
                                [--policy.ppo.lam FLOAT]
                                [--policy.ppo.whiten-rewards | 
--policy.ppo.no-whiten-rewards]
                                [{policy.rewards.adaptive-kl:adaptive-kl-p
arams,policy.rewards.adaptive-kl:None}]

╭─ Parsing error ──────────────────────────────────────────────────────────╮
│ Unrecognized arguments: --track                                          │
│ ──────────────────────────────────────────────────────────────────────── │
│ Similar arguments:                                                       │
│     --policy.track, --policy.no-track                                    │
│         if toggled, this experiment will be tracked with Weights and     │
│         Biases (default: False)                                          │
│             in train_both_accelerate.py --help                           │
│     --reward.track, --reward.no-track                                    │
│         if toggled, this experiment will be tracked with Weights and     │
│         Biases (default: False)                                          │
│             in train_both_accelerate.py --help                           │
╰──────────────────────────────────────────────────────────────────────────╯

brentyi commented 1 year ago

Ok! We now won't print usage messages that are 400 characters or longer:

vwxyzjn commented 1 year ago

Ok! We now won't print usage messages that are 400 characters or longer:

Nice! FYI absl does this:

poetry run python falcon.py 
FATAL Flags parsing error:
  flag --dataset_name=None: Flag --dataset_name must have a value other than None.
  flag --ckpt_path=None: Flag --ckpt_path must have a value other than None.
Pass --helpshort or --helpfull to see help on flags.

Maybe the Pass --helpshort or --helpfull to see help on flags message could be something to consider as well. Also possibly this can be made configurable as well (e.g., output_help_text_on_error=True).

brentyi commented 1 year ago

Interesting! To clarify, are you proposing both:

To introduce distinct --helpshort and --helpfull flags;
To include help messages like Flag --dataset_name must have a value other than None.?

vwxyzjn commented 1 year ago

To include help messages like Flag --dataset_name must have a value other than None.?

This. Currently

from dataclasses import asdict, dataclass, field
import tyro
@dataclass
class Args:
    exp_name: str
    seed: int = 1
    track: bool = False

args = tyro.cli(Args)

gives

but when it's long, it's hardly readable. E.g.,

Ok! We now won't print usage messages that are 400 characters or longer:

I am also proposing to make this configurable so that the user can choose not print usage messages that are N characters or longer. Personally I would set N=0 because I could just type python teest.py --help if I want the help text — when the command fails, I am only interested in the relevant errors.

Thanks for being patient with my feature requests!

brentyi commented 1 year ago

Just made a PR with these suggestions, thanks!

I am also proposing to make this configurable so that the user can choose not print usage messages that are N characters or longer. Personally I would set N=0 because I could just type python teest.py --help if I want the help text — when the command fails, I am only interested in the relevant errors.

For now I just (did the equivalent of) setting N=0, for things like this (especially while I still feel like we're iterating) I'd prefer to err on the side of being prescriptive. It's hard to remove configurable parameters once they've been added. 🙂

Thanks for being patient with my feature requests!

I appreciate your help with making the CLI experience better!

brentyi / tyro

Better error messages #63

Codecov Report