01mf02 / jaq

A jq clone focussed on correctness, speed, and simplicity
MIT License
2.61k stars 63 forks source link

Implement `--sort-keys` #149

Open xduugu opened 6 months ago

xduugu commented 6 months ago

Sorting the keys is quite useful, e.g. when comparing two json files.

    --sort-keys / -S:

Output the fields of each object with the keys in sorted order.
01mf02 commented 6 months ago

I understand the usefulness of sorting keys, but this functionality could be equally well implemented as a filter.

What do you think about including the following definition in jaq?

def sort_keys: walk(if isobject then to_entries | sort_by(.key) | from_entries end);

That would allow you to replace jq --sort-keys 'f' by jaq 'f | sort_keys'.

01mf02 commented 6 months ago

I just realised that to_entries already sorts by key. That means that we can get away by writing simply:

def sort_keys: walk(if isobject then to_entries | from_entries end);

We can make that more performant by unfolding *_entries:

def sort_keys: walk(if isobject then [keys[] as $k | { ($k): .[$k] }] | add + {} end)

Unfolding add allows us to save one array:

def sort_keys: walk(if isobject then reduce (keys[] as $k | { ($k): .[$k] }) as $o ({}; . + $o) end);

And that's probably how fast it gets when doing it by definition.

Testing:

jaq -n 'def sort_keys: walk(if isobject then reduce (keys[] as $k | { ($k): .[$k] }) as $o ({}; . + $o) end); {c: {b: 1, a: 2}, a: 1, b: {}} | sort_keys'
{
  "a": 1,
  "b": {},
  "c": {
    "a": 2,
    "b": 1
  }
}
xduugu commented 6 months ago

@01mf02 Implementing it as a filter instead of a commandline option would be fine for me, thanks! As far as I know, jq only supports the --sort-keys options, that's why I proposed it.

I just realised that to_entries already sorts by key. That means that we can get away by writing simply:

Is this on purpose? I ask because jq explicitly does not sort the keys since commit https://github.com/jqlang/jq/commit/4a57b84db03db7aee33c47ed6c7f6c1e648705fd.

pkoppstein commented 6 months ago

@01mf02 wrote:

I just realised that to_entries already sorts by key.

If so, that is (or should be considered) a bug.

01mf02 commented 6 months ago

Is this on purpose? I ask because jq explicitly does not sort the keys since commit jqlang/jq@4a57b84.

No, that was not on purpose. 484dd27 should now correct this. Thanks for spotting this!

@pkoppstein, what would you think about including the proposed sort_keys in jq? This would make it possible to express sorting by keys the same way in both jaq and jq.

pkoppstein commented 6 months ago

@01mf02 asked:

what would you think about including the proposed sort_keys in jq? This would make it possible to express sorting by keys the same way in both jaq and jq.

My only strong preference regarding these matters is that jaq's to_entries closely tracks jq's, so thank you for https://github.com/01mf02/jaq/commit/484dd270a084ee7e6f233c781f96444264af530b

It may, however, be of interest to know that some of the jq maintainers have a strong distaste for command-line options and seem only to retain -S mainly for the sake of backwards compatibility. In this particular case, since the proposed sort_keys is trivial to implement in jq, I would tend to regard it as unnecessary bloat. There are certainly other functions higher on my priority list for adding to the standard "built-ins".

Another consideration is that if jaq defines sort_keys before jq, then that will introduce a discrepancy and hence add confusion and noise.

01mf02 commented 5 months ago

It may, however, be of interest to know that some of the jq maintainers have a strong distaste for command-line options and seem only to retain -S mainly for the sake of backwards compatibility.

That is understandable.

In this particular case, since the proposed sort_keys is trivial to implement in jq, I would tend to regard it as unnecessary bloat. There are certainly other functions higher on my priority list for adding to the standard "built-ins".

I do not understand why you classify this as "unnecessary bloat". Many filters in builtin.jq are more trivial to implement (e.g. map), yet considered useful.

Another consideration is that if jaq defines sort_keys before jq, then that will introduce a discrepancy and hence add confusion and noise.

That is correct. I would like to avoid such discrepancy. That's why I opened an issue in the jq repository in order to see whether we can proceed in a synchronised way.