Blacksmoke16 / oq

A performant, and portable jq wrapper to facilitate the consumption and output of formats other than JSON; using jq filters to transform the data.
https://blacksmoke16.github.io/oq/
MIT License
190 stars 15 forks source link

Bad slurp with oq, yaml-input and multiple files #70

Closed blurayne closed 3 years ago

blurayne commented 3 years ago

a.yaml:

include:
 - project: 'my-group/my-project'
   file: '/templates/.gitlab-ci-template.yml'
a: ["here", "i", "am"]

b.yaml:

include:
  - remote: 'https://gitlab.com/example-project/-/raw/master/.gitlab-ci.yml'
b: {"greet": "from b with love"}

Correct behavior:

We use oq to first convert yaml to json then use pure jq

$ for y in a.yaml b.yaml; do oq --input=yaml . $y > ${y%.*}.json; done
$ jq -s . a.json b.json
[
  {
    "include": [
      {
        "project": "my-group/my-project",
        "file": "/templates/.gitlab-ci-template.yml"
      }
    ],
    "a": [
      "here",
      "i",
      "am"
    ]
  },
  {
    "include": [
      {
        "remote": "https://gitlab.com/example-project/-/raw/master/.gitlab-ci.yml"
      }
    ],
    "b": {
      "greet": "from by with love"
    }
  }
]

Now what happens here?

$ oq --input=yaml -s . a.yaml b.yaml 
[
  {
    "include": [
      {
        "remote": "https://gitlab.com/example-project/-/raw/master/.gitlab-ci.yml"
      }
    ],
    "a": [
      "here",
      "i",
      "am"
    ],
    "b": {
      "greet": "from b with love"
    }
  }
]
Blacksmoke16 commented 3 years ago

@blurayne This is an issue with how multiple non JSON input files are handled. Because the data has to be converted to JSON first, it's not clear how exactly that should be done. I.e. at the moment the files are concatenated together, then converted into JSON. Because of this, the include key is duplicated, and the 2nd file overrides the value of the first.

I think we have two options (that I can think of at this moment):

  1. Have this just be a limitation and suggest not providing multiple non JSON input files (not really ideal)
  2. Convert each input file to JSON and save it to a tmp file before passing along the tmp files to jq (more ideal, but a bit complex)

I think option 2 would be the better solution, as it would make things more akin to using jq directly; i.e. supporting -s among others. It should be fairly doable, especially with dev branch I have going. Basically just doing this logic if more than 1 file is provided and input format is not JSON. Have any thoughts on that approach or alternate suggestions?

blurayne commented 3 years ago

@Blacksmoke16 2) is compatible with the way JQ handles multiple input files. You also just have to split into temp files if there are more than one one.

I do not think of auto-detection of input format. I'd rather suggest that if a user specifies --input=yaml it means: take care that all your input is yaml!

But maybe something like --input=yaml,json,auto would also be a nice option meaning: first input is yaml, second json, third try to auto-detect. if we have stdin the first input is stdin otherwise a file.

Blacksmoke16 commented 3 years ago

@blurayne

2) is compatible with the way JQ handles multiple input files. You also just have to split into temp files if there are more than one one.

Cool yea, that's how I implemented it in #71. I'll go ahead and do another pass of that and get it merged/released this weekend.

I do not think of auto-detection of input format.

Auto-detection is probably a bit out of scope for this issue and is tracked in #12. tl;dr it would be possible for files, but handling STDIN would be more challenging. Providing a list representing for formats is a pretty good idea tho. I guess the question then is is that a common use case?