RexOps / Rex

Rex, the friendly automation framework
https://www.rexify.org
717 stars 223 forks source link

Rex::Commands::Sync excludes are poorly handled #1597

Open bbrtj opened 1 year ago

bbrtj commented 1 year ago

Describe the bug

There are two issues with sync_up and sync_down commands regarding excluded files and directories:

  1. Excludes are looked at too late, after calculating md5 of all files in the directory (recursively). This means a lot of time is spent doing meaningless work, for example like calculating md5 of all node_modules which are often excluded anyway.
  2. Excludes work well only when a flat file is excluded. Excluding nested files or whole directories does not work as one would expect from using tools like gitignore.

Expected behavior

  1. There's absolutely no need to calculate md5 sums of excluded files, as those will be thrown away unused.
  2. I can't grasp how the excluded directories currently work, but I would expect I have to pass a full path to a file which I want excluded, minus any wildcards.

How to reproduce it

  1. Have a large directory alongside your files
  2. Try syncing up with the directory excluded

Code example

sync_up 'my_dir', 'my_remote_dir', {
  exclude => ['directory_name'],
};

Additional context

No response

Rex version

(R)?ex 1.14.2

Perl version

This is perl 5, version 36, subversion 0 (v5.36.0) built for amd64-freebsd

Operating system running rex

FreeBSD 13.2 (amd64)

Operating system managed by rex

FreeBSD 13.1

How rex was installed?

cpan client

ferki commented 1 year ago

There are two issues with sync_up and sync_down

Then I would recommend opening two separate GitHub issues ;) I'm OK treating Nr. 1 as a bug, but Nr 2. definitely feels like a new feature. History shows they tend to need different details, reasoning, discussion, timelines, etc., and bundling is rarely the simplest way to track them.

I propose focusing on the "superfluous checksumming of excluded files" bug in this issue, and when that's done we'll see if it's best to bundle the rest of the ideas here, or to split that off into its own discussion and solution.

bbrtj commented 1 year ago

@ferki I can chop off fixing how excludes work from the PR, but then I will not deliver as detailed tests for sync_up/down with excludes. I just don't get them, apart from excluding a single flat file. Will that do?

Also, I think making excludes work like gitignore should not be considered a new feature. I mean, it is a functional change, but the current system is underwhelming and straight up confusing to use - and I tried for several hours before doing the refactor. I consider it a bugfix.

mrmuskrat commented 1 year ago

It seems to me that sync_down's excludes should work the same as sync_up and allow excluding directories.

ferki commented 1 year ago

It seems to me that sync_down's excludes should work the same as sync_up and allow excluding directories.

@mrmuskrat: yes, I agree in principle :+1: I didn't look into deeper implications on the actual implementation side yet, though.

ferki commented 1 year ago

@ferki I can chop off fixing how excludes work from the PR, but then I will not deliver as detailed tests for sync_up/down with excludes. I just don't get them, apart from excluding a single flat file. Will that do?

Looking at the current tests around syncing functionality, I find them to be in a quite poor state. There's no Rex::Commands::Sync tests at all yet, and t/rsync.t has several perlcritic violations. However the scope and expected results are highly similar (if not identical).

Based on that, I would certainly start with fixing that before attempting to change anything about the code. I would probably approach further work on these topics something like the following:

  1. Preparation work:
  1. Add proper sync tests:

Note that that up until this point is about covering the existing code with tests, against current expectations. The sync code is ~10 years old, we certainly don't want to accidentally break it. It's also certainly interesting to find any bugs against original expectations that may have been hidden so far due to lack of test coverage.

The goal is to reach a state where it is safe and simple to express new expectations by new test code.

  1. Change the behavior

In fact, I started some of this work locally as an exploration of possibilities, but I haven't progressed too much yet. I might revisit the branch and try to push it somewhere for reference on initial work (so that part does not need to be duplicated by someone else).

Also, I think making excludes work like gitignore should not be considered a new feature. I mean, it is a functional change, but the current system is underwhelming and straight up confusing to use - and I tried for several hours before doing the refactor. I consider it a bugfix.

It is a quite old part of the codebase, without any test coverage, and which was added as an escape hatch for situations when rsync is not available. I'm certain that we should past mistakes right sooner or later by adding proper tests before touching the legacy core code in any way, and before looking at how to improve it.

Going forward I see two options:

bbrtj commented 12 months ago

I'm tempted to turn this into a new module, but I don't think leaving Sync as it is is a good idea.

Keeping backward compatibility here is likely a mistake. The old behavior seems to be exclude any file that matches the basename. If you specify test.txt it will happily exclude both test.txt and dir/test.txt. There is no way to exclude a file nested in specific tree and I have not been able to successfully exclude a directory. So the only use case right now is to exclude all files with a given name no matter in which directory, which is not very useful unless your files are very distinctively named.

I'd say breaking compatibility with this would cause less headaches than leaving it as it is and letting people find out why they are missing random files. This is a possibility given that there is no documentation explaining how these excludes work and the code is not that straightforward either.

The code I delivered in the PR is fine and I'm using it successfully, but I have difficulty living up to the standards of this repository, for which I apologize.

bbrtj commented 1 month ago

If anyone needs this, I created a new module with my modifications: https://metacpan.org/dist/Rex-Commands-PerlSync