cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Handle batch of files #332

Closed tdegeus closed 2 years ago

tdegeus commented 2 years ago

It turns out pre-commit and latexindent.pl are not yet playing together nicely when that are multiple files present. In particular, pre-commit wants latexindent.pl to take all files as argument and treat them all at once, latexindent.pl on the other hand expects a call per file.

pre-commit refuses to allow the one call per file : https://github.com/pre-commit/pre-commit/issues/2222 , https://github.com/pre-commit/pre-commit/issues/771 , https://github.com/pre-commit/pre-commit/issues/557 , https://github.com/pre-commit/pre-commit/issues/613 , https://github.com/pre-commit/pre-commit/issues/394 . At best it offers a BASH workaround, which may not be so nice to Windows users. Could something be considered at the latexindent.pl side ?

Edit: the work-around appears not really a workaround: one would not be able to pass args anymore

cmhughes commented 2 years ago

Thanks for this, and for the references.

This looks like a reasonable thing to explore.

Considerations around the -o switch need care.

I'll need to do some reading about this. In the meantime if you have any references that might help, feel free to post them.

cmhughes commented 2 years ago

https://perldoc.perl.org/functions/glob

https://stackoverflow.com/questions/31093833/glob-function-in-perl-for-getting-filenames

tdegeus commented 2 years ago

Great! It's too bad that my perl experience is virtually non-existent, otherwise I could work on it.

Anyway (1) I think that an API like

latexindent.pl --in-place file1 file2 file3 file4 ...

would be perfect. You can simply require that with --in-place, -o cannot be used.

Then (2) you could consider a file search like

latexindent.pl --project /path/to/directory 

with customisation

latexindent.pl --project /path/to/directory --extensions "tex,sty"

see e.g. https://flake8.pycqa.org/en/latest/user/invocation.html

For me personally (1) would be perfect. I would not use (2) because I would use pre-commit for that, but there might be others that could benefit?

cmhughes commented 2 years ago

Many thanks, I'll take a look at this soon, hopefully.

On Wed, 26 Jan 2022, 08:32 Tom de Geus, @.***> wrote:

Great! It's too bad that my perl experience is virtually non-existent, otherwise I could work on it.

Anyway (1) I think that an API like

latexindent.pl --in-place file1 file2 file3 file4 ...

would be perfect. You can simply require that with --in-place, -o cannot be used.

Then (2) you could consider a file search like

latexindent.pl --project /path/to/directory

with customisation

latexindent.pl --project /path/to/directory --extensions "tex,sty"

see e.g. https://flake8.pycqa.org/en/latest/user/invocation.html

For me personally (1) would be perfect. I would not use (2) because I would use pre-commit for that, but there might be others that could benefit?

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/332#issuecomment-1021974730, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYADLP3OL2KFOXLSVLLUX6WQVANCNFSM5MYMZXPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

cmhughes commented 2 years ago

For my reference, development of this needs to consider:

testing will need to include, at least

latexindent.pl -l myfile.tex myfile1.tex
latexindent.pl -l mysettings.yaml myfile.tex myfile1.tex
latexindent.pl -l *.tex
latexindent.pl -l *.tex *.cls
latexindent.pl -l myfile.tex *.tex
latexindent.pl -l mysettings.yaml *.tex
latexindent.pl -o outputfile.tex *.tex    # should fail, or not output
latexindent.pl -o +-mod1 *.tex    # should work

Cruft files need checking too; by default we send cruft files to the directory of the file so the following needs care:

latexindent.pl /path/one/myfile.tex /path/two/otherfile.tex
tdegeus commented 2 years ago

I don't know if input is needed, so just very brief.

interaction with -o switch

I think -o should simply not be allowed in this mode. It would needlessly complicate and probably would never be used.

cmhughes commented 2 years ago

I'm working on this today. Early signs are promising.

The o switch has a few features which can plug into this in a natural way.

Updates to follow :)

On Sun, 30 Jan 2022, 09:30 Tom de Geus, @.***> wrote:

I don't know if input is needed, so just very brief.

interaction with -o switch

I think -o should simply not be allowed in this mode. It would needlessly complicate and probably would never be used.

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/332#issuecomment-1025104359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYGKVSZZVW4X63EHTODUYUAKJANCNFSM5MYMZXPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

cmhughes commented 2 years ago

Which of the following best describes how this needs to work:

  1. latexindent.pl *.tex
  2. latexindent.pl /path/to/project

In other words, does latexindent.pl need to find the files, or does it simply have to handle multiple files being passed to it?

tdegeus commented 2 years ago

A list of files. I think that that's also the most robust API

cmhughes commented 2 years ago

So option 1?

tdegeus commented 2 years ago

Yes, option 1!

cmhughes commented 2 years ago

Apologies for the delay, I'm working on this. A few issues remain, but it is progressing. Updates to follow.

cmhughes commented 2 years ago

Apologies for the delay, the text wrap overhaul (https://github.com/cmhughes/latexindent.pl/issues/346) was needed before I could progress this.

As of https://github.com/cmhughes/latexindent.pl/commit/b4ad78d6e41ca60b5bfeac2be01dd8d5649787b7 this is implemented you can call instruct latexindent.pl to operate on multiple files. For example, the following calls are all valid

   latexindent.pl myfile1.tex          
   latexindent.pl myfile1.tex myfile2.tex
   latexindent.pl myfile*.tex

Before I get this released, is this as you would expect? In particular does the following behaviour make sense?


We note the following features of the script in relation to the switches detailed in :numref:sec:how:to:use.

location of indent.log

If -c switch is not active, then indent.log goes to the directory of the final file called.

If -c switch is active, then indent.log goes to the specified directory.

interaction with -w switch

If -w switch is active, as in

   latexindent.pl -w myfile*.tex

then files will be overwritten individually. Back-up files can be re-directed via -c switch.

interaction with -o switch

If latexindent.pl is called using the -o switch as in

    latexindent.pl myfile*.tex -o=my-output-file.tex 

and there are multiple files to operate upon, then the -o switch is ignored because there is only one output file specified.

More generally, if the -o switch does not have a + symbol at the beginning, then the -o switch will be ignored, and is turned it off.

For example

   latexindent.pl myfile*.tex -o=+myfile

will work fine because each .tex file will output to <basename>myfile.tex

Similarly,

   latexindent.pl myfile*.tex -o=++

will work because the ‘existence check/incrementation’ routine will be applied.

interaction with lines switch

This takes behaves naturally by attempting to operate on the line numbers specified for each file. See the examples in :numref:sec:line-switch.

interaction with check switches

The exit codes for latexindent.pl are given in :numref:tab:exit-codes.

When operating on multiple files with the check switch active, as in

   latexindent.pl myfile*.tex --check

then

The interaction with checkv switch is as in the check switch, but with verbose output.

when a file does not exist

If one of the files that latexindent.pl is called to operate upon does not exist, or can not be read, latexindent.pl will continue to try to operate on the remaining files. A warning will be given to the log file.

tdegeus commented 2 years ago

Amazing! (Like I said earlier: there is really no reason to apologise, I'm grateful that the project exists, and I already find that development times are short, and since the source is open I could always invest time myself (I tried a bit, but my perl knowledge is too rudimentary ;)).

What is crucial for pre-commit to work is that changing can be done in-place on multiple files. This would be the

latexindent.pl -w *.tex

so all good from that perspective. Indeed I tried on my fork and it works. To try what I did

I tried with .pre-commit-config.yaml:

repos:
- repo: https://github.com/tdegeus/latexindent.pl
  rev: V4.0.0
  hooks:
  - id: latexindent
    args: [-s, -w, -l, -m]

and I have:

$ find . -name '*.tex'
./examples/twocolumn/example.tex
./examples/basic/example.tex
./examples/general-trick_dummy-subfigure-upper/example.tex
./examples/general-trick_dummy-subfigure/example.tex
./examples/namecite/example.tex

for which I made indentation 'mistakes' in

./examples/twocolumn/example.tex
./examples/general-trick_dummy-subfigure/example.tex
./examples/namecite/example.tex

Indeed, those three files were updated by

pre-commit run --all-files

However, what I did not fully expect it to find backup files for all

$ ls -l `find . -name '*.bak'`
-rw-r--r--  1 tdegeus  staff  1285 Mar 16 10:15 ./examples/basic/example.bak
-rw-r--r--  1 tdegeus  staff   901 Mar 16 10:15 ./examples/general-trick_dummy-subfigure-upper/example.bak
-rw-r--r--  1 tdegeus  staff   823 Mar 16 10:15 ./examples/general-trick_dummy-subfigure/example.bak
-rw-r--r--  1 tdegeus  staff  1280 Mar 16 10:15 ./examples/namecite/example.bak
-rw-r--r--  1 tdegeus  staff  2108 Mar 16 10:15 ./examples/twocolumn/example.bak

so even for the files that were completely unchanged by the formatter. Indeed, all the files where overwritten:

$ ls -l `find . -name '*.tex'`
-rw-r--r--  1 tdegeus  staff  1285 Mar 16 10:15 ./examples/basic/example.tex
-rw-r--r--  1 tdegeus  staff   901 Mar 16 10:15 ./examples/general-trick_dummy-subfigure-upper/example.tex
-rw-r--r--  1 tdegeus  staff   831 Mar 16 10:15 ./examples/general-trick_dummy-subfigure/example.tex
-rw-r--r--  1 tdegeus  staff  1284 Mar 16 10:15 ./examples/namecite/example.tex
-rw-r--r--  1 tdegeus  staff  2120 Mar 16 10:15 ./examples/twocolumn/example.tex

There is an optimisation possible here (which would be nice I think). That is to run the formatter, and internally compare the input and the final output. And only overwrite if the two are different. In that case also no backup should be generated.

Then I was surprised by finding

ls -l `find . -name 'indent.log'`
-rw-r--r--  1 tdegeus  staff  5765 Mar 16 10:15 ./examples/namecite/indent.log
-rw-r--r--  1 tdegeus  staff  3597 Mar 16 10:15 ./examples/twocolumn/indent.log

From the above I thought I should expect just one file (in an arbitrary place in this case, because pre-commit decided on the file order fed to latexindent.pl).

Here my optimisation suggestion would be to change the default to indent.log to either:

tdegeus commented 2 years ago

P.S. I did not completely get what the --check switch is for, but enabling it change nothing to the above.

cmhughes commented 2 years ago

OK, many thanks, that's great.

In summary

I hope this is accurate, let me know if not.

Related to the above is the -c switch to redirect cruft files.

cmhughes commented 2 years ago

new! the -wd switch

As of https://github.com/cmhughes/latexindent.pl/commit/552ee52c2d8dc38b4c83d6688c31040b097652bc there is a new switch, the overwrite if different switch which is called as in

latexindent.pl -wd myfile.tex

It will only overwrite myfile.tex if the indented text is different from the original text.

In terms of your pre-commit file, the following should work

repos:
- repo: https://github.com/tdegeus/latexindent.pl
  rev: V4.0.0
  hooks:
  - id: latexindent
    args: [-s, -wd, -l, -m]

about multiple copies of indent.log

I'm not sure why there are multiple copies of indent.log. If it's a deal breaker, feel free to post them here.

tdegeus commented 2 years ago

Sounds great!

No the multiple indent.log does not bother me, I just wanted to have a thorough test result for you.

cmhughes commented 2 years ago

Great, many thanks, Tom!

I'll get this released in the next few days, hopefully 👍

cmhughes commented 2 years ago

This is implemented as of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17 :)

tdegeus commented 2 years ago

Thanks a million !