Blacksmoke16 / oq

A performant, and portable jq wrapper to facilitate the consumption and output of formats other than JSON; using jq filters to transform the data.
https://blacksmoke16.github.io/oq/
MIT License
190 stars 15 forks source link

triggering jq parse error when YAML key is one of the boolean keywords #129

Open anapsix opened 1 month ago

anapsix commented 1 month ago

oq seems to interpret on and off keys in YAML as keywords, and fails with the following error:

jq: parse error: Unfinished JSON term at EOF at line 1, column 1

I discovered it while trying to parse the Github Action workflow file, which contained on key. Not sure if the exception should be made for keyword interpretation of key names.. but it would be great to be able to process Github Actions workflows with oq, as well as any YAML with on|ON|off|OFF key names.

❯ echo 'on: test' | oq -i yaml .
jq: parse error: Unfinished JSON term at EOF at line 1, column 1

❯ echo 'off: test' | oq -i yaml .
jq: parse error: Unfinished JSON term at EOF at line 1, column 1

❯ echo 'true: test' | oq -i yaml .
jq: parse error: Unfinished JSON term at EOF at line 1, column 1

❯ echo 'false: test' | oq -i yaml .
jq: parse error: Unfinished JSON term at EOF at line 1, column 1

❯ echo 'yes: something' | oq -i yaml
jq: parse error: Unfinished JSON term at EOF at line 1, column 1

❯ echo 'no: something' | oq -i yaml
jq: parse error: Unfinished JSON term at EOF at line 1, column 1

❯ curl -sS "https://raw.githubusercontent.com/Blacksmoke16/oq/master/.github/workflows/ci.yml" \
  | oq -i yaml .
jq: parse error: Unfinished JSON term at EOF at line 1, column 12

testing interpretation of ON and OFF

Blacksmoke16 commented 1 month ago

Thanks for the report! This seems to stem from Crystal making use of the YAML 1.1 Core Schema, specifically how this spec considers on and off as booleans.

Crystal also has support for their fail safe schema which would resolve this by considering everything as strings, but that also causes other things to break which is less than ideal. I think ultimately the best option would be to implement support for https://yaml.org/spec/1.2.2/#102-json-schema and use this within oq by default, given how closely related to JSON it is. I'll file something upstream to track that.

In the meantime, I think you could make use of the simple_yaml format which I'd like to make the default in the future as it's a stream parser so is more memory efficient, but is unable to handle anchors and aliases. However for parsing some GHA workflow files I don't think that'll be a problem.

$ echo 'on: bar' | ./bin/oq -i simple_yaml .
{
  "on": "bar"
}

Related: https://www.bram.us/2022/01/11/yaml-the-norway-problem/

anapsix commented 1 month ago

🤯 I've been using oq for years, and simpleyaml format option completely slipped my mind. Thank you for pointing it out. And of course, thank you for oq 🙇 🙏