evilmartians / lefthook

Fast and powerful Git hooks manager for any type of projects.
MIT License
4.53k stars 210 forks source link

Order of scripts and command #132

Open lionskape opened 4 years ago

lionskape commented 4 years ago

Are there any way to control the sequence of scripts and commands?

For example to execute scripts after commands? Or to execute the first command, after that execute the first script, and after that execute the second command?

Arkweid commented 4 years ago

Only with this way: piped-option Userful for migrations and similar stuff.

lionskape commented 4 years ago

Is it possible to control order between commands and scripts? Like - command_1, script_1, command_2

Piped example shows only order between commands.

Arkweid commented 4 years ago

No, only commands.

sanmai-NL commented 11 months ago

New location of docs: https://github.com/evilmartians/lefthook/blob/master/docs/configuration.md#piped

BnGx commented 7 months ago

Currently the hooks are executed in alphabetical order, however one of the following options would instead be appropriate:

database:
  piped: true
  commands:
    create:
      priority: 40
      run: rake db:create
    review:
      priority: 45
      interactive: true
      run: <some interactive command>
    migrate:
      run: rake db:migrate
    seed:
      priority: 60
      run: rake db:seed

This would improve the understanding of the lefthook configuration without introducing distortions.

mrexox commented 7 months ago

@BnGx , interactive hooks always run after non-interactive hooks. Scripts run after the all commands. The order is alphanumeric, so you can do something like

hook:
  commands:
    1_do_this:
      run: ...
    2_then_this:
      run: ...
    ...

It probably makes sense to allow a special order of scripts + commands, but why? Would you describe a case when this is really needed and alphanumeric order is not enough?

BnGx commented 7 months ago

As I see it, implementing a priority option in Lefthook would be a better approach for several reasons:

Assuming the configuration is as follows

pre-commit:
  piped: true
  commands:
    ...
    15_format_ruby:
      run: ...
    15_format_python:
      run: ...
    20_lint_ruby:
      run: ...
    20_lint_python:
      run: ...
    ...

In this case, if I had to execute a hook manually or call it from another tool or in a CI environment I would have to remember the precise name of the hook and write lefthook run pre-commit --commands 15_format_python,20_lint_python.

This seems really rather unintuitive to me than to write lefthook run pre-commit --commands format_python,lint_python as would happen if a priority option was implemented.

In summary, from my perspective, while the alphabetical ordering method may appear simple, it falls short in effectively managing the execution order of hooks in a dynamic and scalable way. A priority-based system, on the other hand, would offer a robust, transparent, and easily manageable solution, aptly suited to the demands of contemporary, complex software projects. Furthermore, by assigning a default value to the priority, all existing configurations would not suffer any impact.

sanmai-NL commented 7 months ago

@mrexox By imposing an order by design, commands that are order-dependent cannot always work. For example, a command may have an auto-fixer and only after that fixer is done (e.g., syntax migration), other checkers can do their job. Also, some commands make take a longer time (Lefthook can be run in CI too, not just as a quick pre-commit check), and you may want to run them first or last. For example, some data science data validator tools may take longer time than the common source code checkers.

By allowing commands to be a list of dicts, rather than a dict, the user may indicate that order is important in a backward-compatible way.

mrexox commented 7 months ago

Ok, this sounds reasonable, but I want to make sure that I understand the use case. Do you expect commands with the same priority to be run in parallel? Do you expect commands and scripts to respect the priority of each other? Right now scripts are always executed after commands. Did I correctly understand that a special option for priority is clearer and more intuitive than naming? Do you expect naming order to work within the same priority?

Also, have you tried this approach?

python:
  commands: &python-commands
    1_format-python:
      run: ...
    2_lint-python:
      run: ...

ruby:
  commands: &ruby-commands
    1_format-ruby:
      run: ...
    2_lint-ruby:
      run: ...

pre-commit:
  piped: true
  commands:
    <<: *ruby-commands
    <<: *python-commands

Then you can replace the lefthook run pre-commit --commands ... with lefthook run python or lefthook run ruby. This might be more convenient in your case.

There's also a way to group hooks, so groups run one after another, but commands in every group are executed in parallel:

pre-commit:
  piped: true
  commands:
    1_linters:
      run: lefthook run lint
    2_data-validators:
      run: lefthook run validate-data

lint:
  parallel: true
  commands:
    ruby-lint:
      ...
    python-lint:
      ...

validate-data:
  parallel: true
  commands:
    validate-data-1:
      ...
    validate-data-2:
      ...

Would this help with your case, @sanmai-NL?

I know that lefthook is being run in the CI. I believe YAML anchors and sub-hooks can be used to build any complex pipeline. If it is possible, please, share your case where these approaches are not working, so I could better understand the use case of this. I understand that priority option makes things a bit clearer but if there's a way to avoid adding one more thing that extends the configuration, I'd go with that way to keep lefthook configuration as simple as possible.

BnGx commented 7 months ago

I'll answer you by analyzing your answer step by step:

Do you expect commands with the same priority to be run in parallel?

No because I can use the "parallel: true" approach by rearranging the lefthook configuration.

Do you expect commands and scripts to respect the priority of each other? Right now scripts are always executed after commands.

I had excluded the scripts, however I find it good and coherent idea with what we are discussing.

Did I correctly understand that a special option for priority is clearer and more intuitive than naming?

Absolutely yes.

Do you expect naming order to work within the same priority?

I expect that the naming of the command does not affect the order in which it is executed.

Also, have you tried this approach?

  python:
    commands: &python-commands
      1_format-python:
        run: ...
      2_lint-python:
        run: ...

  ruby:
    commands: &ruby-commands
      1_format-ruby:
        run: ...
      2_lint-ruby:
        run: ...

  pre-commit:
    piped: true
    commands:
      <<: *ruby-commands
      <<: *python-commands

Yes, but this approach does not consider that the commands could derive from remote or local configuration files e.g.:

extends:
  - .lefthook/format.lefthook.yml

remote:
  git_url: git@github.com:evilmartians/remote
  ref: v1.0.0
  config: examples/lint.lefthook.yml

Therefore YAML anchors and aliases would not work in the lefthook.yml file of the main project.

There's also a way to group hooks, so groups run one after another, but commands in every group are executed in parallel:

  pre-commit:
    piped: true
    commands:
      1_linters:
        run: lefthook run lint
      2_data-validators:
        run: lefthook run validate-data

  lint:
    parallel: true
    commands:
      ruby-lint:
        ...
      python-lint:
        ...

  validate-data:
    parallel: true
    commands:
      validate-data-1:
        ...
      validate-data-2:
        ...

In this case, the lefthook command is called in pre-commit hook, giving rise to further problems such as:

mrexox commented 7 months ago

Ok, I see. So, I guess this is your use case:

database:
  piped: true
  commands:
    create:
      priority: 40
      run: rake db:create
    review:
      priority: 45
      interactive: true
      run: <some interactive command>
    migrate:
      run: rake db:migrate
    seed:
      priority: 60
      run: rake db:seed

And you expect review command to be executed after create but before migrate and seed, right?

BnGx commented 7 months ago

If it were possible, it would be perfect (and great!!!!) to run review command after create but before migrate and seed. However, in order not to distort the current flow (i.e. that of the interactive commands executed at the end) it could be considered that the priority of the interactive commands is managed separately, therefore in a typical example:

database:
  piped: true
  commands:
    create:
      priority: 40
      run: rake db:create
    review:
      priority: 45
      interactive: true
      run: <some interactive command>
    migrate:
      run: rake db:migrate
    seed:
      priority: 60
      run: rake db:seed
    test:
      priority: 50
      interactive: true
      run: <some interactive command>

This way the review command should be executed after create, migrate and seed but before test.

mrexox commented 7 months ago

I think priority of 50 is not obvious as a default. I'd suggest 0 as the default. If you don't specify a priority it is 0, and the command is always executed at the very end.

I think it makes sense since priority should be a natural number.

BnGx commented 7 months ago

I specify that I was referring to the ascending priority order (0 highest priority, 99 lowest priority), therefore a command with priority 1 is executed before a command with priority 10. According to this consideration, the commands with priority 0 would all be executed for first.

My proposal to use 50 arises from the possibility of being able to integrate the new configurations (those with the priority option) with the pre-existing ones, giving the possibility of being able to execute a new command (local or remote) before or after those already existing in the old configurations avoiding having to make further changes to them and leaving their functioning unchanged.

The number 0 would be ok if it were also possible to specify negative numbers, although this would significantly deviate from the concept of ascending priority queue in which I have rarely seen integers with a negative sign used as priority.

mrexox commented 7 months ago

I see what you mean but there are a few reasons I decided to use 0 as the default.

  1. In Golang there are default values for nulls for every type. For the int it is 0, so, right now it's much simpler to use 0 as a default.
  2. In my opinion, it is not obvious that 50 is the default. I must explain this in the documentation and you must remember this. I believe that the integration of the priority option should be explicit. I mean, if you decided to use priorities and want your configuration to be clear – use them everywhere, so the reader of the config doesn't have to remember that 50 is the default.
  3. If something goes wrong with 0 as the default I think it'll be easier to fix this (although it will require some effort) and consider non-set priorities as +infinity.

Let's use only natural numbers for the priorities. I think negative numbers and 0 value just don't make sense when you want to say: run this first, then this and that. I believe that this approach is rather simple and intuitive although it may not be formally correct. But eventually I think it will cover your use case. WDYT?

sanmai-NL commented 7 months ago

@BnGx What's the value of a priority key over an ordered datastructure, like a list?

BnGx commented 7 months ago

@mrexox I agree that the integration of the priority option should be explicit. I think the choice of 0 rather than 50 can be addressed simply by customizing the configuration. So if you think it is more appropriate to use 0 for choices related to the programming language in use I think it is right. Nothing prevents us from varying the logic according to the future needs of the community, I suppose with little effort.

@sanmai-NL the introduction of an ordered datastructure is still a valid alternative, but this would involve the introduction of a breaking change that could cause quite a few problems for those who have been using lefthook for some time. Furthermore, there would always be a limitation in properly extending the configuration using the extends and remote options.

sanmai-NL commented 7 months ago

@BnGx It would not, the data type of commands would be [dict | list[dict]] in Python terms. I.e., the union of both types. This means that the dict-based schema is still valid and nothing breaks. What is the problem with extends exactly? Merging lists is perfectly valid operation, and it can simply be rejected if multiple lists are specified. When a priority is specified, how to merge if the priority values are equal?

BnGx commented 7 months ago

@sanmai-NL To help me understand better, could you give a configuration example?

sanmai-NL commented 7 months ago

.lefthook/example.lefthook.yml:

database:
  piped: true
  commands:
    - name: create
      run: rake db:create
    - name: review
      interactive: true
      run: <some interactive command>
    - name: migrate
      run: rake db:migrate
    - test:
      interactive: true
      run: <some interactive command>
    - seed:
      run: rake db:seed
extends:
  - .lefthook/example.lefthook.yml

remote:
  git_url: git@github.com:evilmartians/remote
  ref: v1.0.0
  config: examples/lint.lefthook.yml

pre-commit:
  piped: true
  commands:
    1_linters:
      run: lefthook run lint
    2_data-validators:
      run: lefthook run validate-data

This mixing of two schemas (list of commands, dict of commands) would work fine. In the special case that someone wants to merge two lints, the commands values can be merged according to some algorithm, like: dicts of commands are appended as list items to the list of commands, etc. Or simply, in the beginning: not supported.

(The whole extends feature would be better deprecated and implemented using JSON patch, which is an industry standard to this problem of merging config files in JSON, YAML, TOML. See https://github.com/kubernetes-sigs/kustomize/blob/master/examples/jsonpatch.md for an example in Kubernetes.)

BnGx commented 7 months ago

Using either dictionaries or a list of dictionaries as a configuration format is acceptable, but, for me, it is advisable to exclusively use one of these formats for the following reasons:

A single format makes the data structure more predictable and easier to understand. When developers or users of your system see a consistent format, they can more easily grasp how the data is organized and how to interact with it. Consistency in data formatting is crucial, especially in complex systems or large projects with many contributors. With a standard format, you reduce the risk of data misinterpretation and facilitate collaboration among team members.

With a uniform configuration format, validation becomes simpler. You can implement consistency checks and data validation more efficiently, reducing the possibility of errors or malformed data. A single, well-defined format is easier to maintain and update. When changes need to be made, it’s simpler to do so in a system with a cohesive data structure rather than one with multiple different formats.

Also consider that the lefthook makes its speed a source of pride so managing a single type of data structure can improve system performance, as algorithms and logic can be optimized for that specific structure.

(The whole extends feature would be better deprecated and implemented using JSON patch, which is an industry standard to this problem of merging config files in JSON, YAML, TOML. See https://github.com/kubernetes-sigs/kustomize/blob/master/examples/jsonpatch.md for an example in Kubernetes.)

I believe this is a choice of the developers who decided to merge external files using the Yaml library, effectively obtaining a valid and processable YAML file with the tools currently available for formatting and linting.

So the list of dicts or simple dicts is fine, the key is to balance consistency and clarity with the necessary flexibility and functionality, but introducing a new format would mean a greater implementation and maintenance effort.

sanmai-NL commented 7 months ago

@BnGx I think the arguments you make don't weigh up against breaking backward compatibility. As a middle ground, the dictionary-based commands can be deprecated in favor of the list of dictionaries-based commands. Also, a version field can be put inside the Lefthook YAML to give the maintainers more freedom in improving the format.

BnGx commented 7 months ago

So do you propose to abandon dictionaries in the future? How would you make existing old configurations compatible?

sanmai-NL commented 7 months ago

Older configurations can be run by an older version of Lefthook. But that's only an issue at all if you deprecate the current format, which is one of your suggestions.

mrexox commented 7 months ago

I am not planning to change the format of commands. One of the key features of lefthook is the ability to use lefthook.yml and a lefthook-local.yml with some overrides. With dictionary approach this is done with no effort via YAML processing library. And I don't yet understand the value of a list structure, since (as I see) the results can be reached with the existing options.

By the way, priority option has just been release with 1.5.5 version. Only for commands now.

BnGx commented 7 months ago

Thanks @mrexox, I will look forward to compliance for scripts too! 👌

BnGx commented 7 months ago

@mrexox The following line of documentation, based on what has been implemented and tested, is no longer true.

All interactive commands/scripts are executed after non-interactive.

sanmai-NL commented 7 months ago

@mrexox

And I don't yet understand the value of a list structure, since (as I see) the results can be reached with the existing options.

With existing, you mean? Ordering of commands could not be done before 1.5.5. Now it can, but what happens if priority values are equal? You are doing a simple dict merge. Will commands with the same priority be run in parallel?

By using a list, the order is encoded in the data (list order), without the need to carefully set priority values for all commands, taking into consideration external configurations as well.

mrexox commented 7 months ago

Right now there's no way to run some commands sequential and some in parallel. Only if using sub-hooks and running them from within the another command, like in the example I provided before. This is not a perfect solution and there might be issues with remote and extends options too. But if you describe a real use case, I will try to figure out what needs to be improved in lefthook configuration.

I agree that having commands with the same priority (or configured in a list) go parallel and others go sequentially is convenient. But I really need to understand a use case and that there's no other way to do that, rather than change lefthook configuration format.

sanmai-NL commented 7 months ago

@mrexox But going parallel isn't needed for me, just trying to illustrate, why a priority field is verbose and hard to manage as opposed to a list. Even without supporting parallelism, just with a flat list of commands.