cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Regex::Grammars support regex-based verbatim environment and command #290

Closed XuehaiPan closed 3 years ago

XuehaiPan commented 3 years ago

The first commit:

Add new grammar in settings, new entry name (and lookForThis) for noIndentBlock, verbatimEnvironments and verbatimCommands:

  1. name for noIndentBlock: Set begin and end to \\begin\{(${name})\} and \\end\{\2\} when not provided. The group \2 will force paired name (useful for regex-based envname, no effect for literal) for the environment. The original implementation cannot set \2 in end because users do not know the exact regex group number and it should be set internally.

  2. name for verbatimEnvironments: Regex support, treat name as regex. The following setting (https://github.com/cmhughes/latexindent.pl/issues/288#issuecomment-925701506):

    verbatimEnvironments:
        minted*: 1
        '\w+code*?': 1

    will treat * as literal, and same issue for noIndentBlock: abc*: 1. If the user wants to use regex (\*):

    verbatimEnvironments:
        mintedaliases:
            name: '\w+code\*?'
  3. name for verbatimCommands: The only purpose is to making verbatimCommands settings to be similar to verbatimEnvironments. The following two are the same:

    verbatimEnvironments:
        '\w+inline': 1
    
        mintinline:
            name: '\w+inline'

The second commit:

Add update defaultSettings.yaml to resolve #288. I can revert it if you feel unhappy with it.


The third commit:

Add new test for regex-based verbatimEnvironments and nested minted code blocks.


The other things:

The document will need to be updated. Since defaultSettings.yaml has been changed and we need to update line numbers. But I don't know whether I need to update these manually or it can be done by a script.

cmhughes commented 3 years ago

Hi @XuehaiPan , Many thanks for this, it looks great, and a helpful addition to the project!

The list below looks quite long, but each thing is quite small; I hope you don't mind working through them! :) Let me know if you have any problems with any of it.

In Verbatim.pm, please can you:

  1. on line 68 can you change it so that ${$yesno}{begin} = qr/\\\\begin\\{(${$yesno}{name})\\}/;
  2. on line 69 can you change it as in line 68
  3. on line 175 can you change it so that it says "looking for regex based VERBATIM-environments"
  4. on line 267 can you change it so that it says "looking for regex based VERBATIM-commands"

In defaultSettings.yaml, please can you:

  1. on line 110 can you change mintedaliases to be nameAsRegex
  2. on line 111 can you change the comment to say "allows followed by 'code', optionally followed by *"
  3. can you add line 112 to say lookForThis: 0 so that the default behaviour of the script is not changed?
  4. on line 116 can you change codeinline to be nameAsRegex
  5. on line 117 can you change the comment to say "allows followed by 'inline'"
  6. can you add line 118 to say lookForThis: 0 so that the default behaviour of the script is not changed?

In test-cases/verbatim/verbatim-test-cases.sh

  1. can you update test-cases/verbatim/verbatim-test-cases.sh to include latexindent.pl -s verbatim7 -l nameAsRegex.yaml -o=+-mod-1, on line 53 with nameAsRegex.yaml as the following
verbatimEnvironments:
    nameAsRegex:
      lookForThis: 1

verbatimCommands:
    nameAsRegex:
      lookForThis: 1

Finally,

  1. can you squash the commits into one commit, please?

Thanks again, this looks great! :) Let me know how it goes :) Chris

XuehaiPan commented 3 years ago

Hello @cmhughes, I have applied suggestions 3-12.


For suggestion 1-2:

  1. on line 68 can you change it so that ${$yesno}{begin} = qr/\\\\begin\\{(${$yesno}{name})\\}/;

I got:

Unescaped left brace in regex is passed through in regex; marked by <-- HERE in m/\\\\begin\\{ <-- HERE (\w+noindent\*?)\\}/ at /home/PanXuehai/Projects/latexindent/LatexIndent/Verbatim.pm line 68.

I changed it to qr/\\begin\{(${$yesno}{name})\}/ and worked fine.

  1. on line 69 can you change it as in line 68

I changed it to qr/\\end\{\2\}/; and I got:

Reference to nonexistent group in regex; marked by <-- HERE in m/\\end\{\2 <-- HERE \}/ at /home/PanXuehai/Projects/latexindent/LatexIndent/Verbatim.pm line 69.

If I change \\2 to \2, no errors when running but will get wrong indent results.


Since the line numbers of the default setting have been changed, should I update the document in this PR?

cmhughes commented 3 years ago

Hi @XuehaiPan , That's great, many thanks.

I suggest that we ignore suggestions 1 and 2, and leave it as is.

Since the line numbers of the default setting have been changed, should I update the document in this PR?

No, don't worry about this, I'll take care of the documentation update.

Can I just check the following summary is accurate:

Summary

  1. Anything specified within noIndentBlock, verbatimEnvironments and verbatimCommands can now be specified in the form
    
    verbatimEnvironments:
       nameAsRegex:
         name: '\w+code\*?' 
         lookForThis: 1
2. the `lookForThis` field is optional, and if not present, will assumed to be `1` (this is consistent with other settings in the script)
3. we have used `nameAsRegex` but it could be named as anything, for example, it could be named `mintedalias`
```yaml
verbatimEnvironments:
       mintedalias:
         name: '\w+code\*?' 
         lookForThis: 1
  1. if the users code only contains \begin{pythoncode}...\end{pythoncode} then the above YAML settings is equivalent to
    verbatimEnvironments:
       pythoncode: 1

Is this an accurate summary? Let me know, and I'll get this merged and documented! :) Thanks again!

XuehaiPan commented 3 years ago

Is this an accurate summary?

Yes!


Additional to noIndentBlock in 1:

Users can specify begin and end (both) or only name:


noIndentBlock:
  beginend:
    begin: regex_begin
    end: regex_end
    body: regex_body  # optional

  nameonly:
    name: regex_name
    body: regex_body  # optional

  # the above is equivalent to the follows when `regex_name` does not contain `'*'` and body is omitted
  regex_name: 1

  # the followings are ignored
  none:  # incomplete settings, ignored
    lookForThis: 1

  beginonly:  # incomplete settings, ignored
    begin: regex_begin

  endonly:  # incomplete settings, ignored
    end: regex_end

  namebeginend:  # conflict settings, ignored
    begin: regex_begin
    end: regex_end
    name: regex_name

  namebegin:  # conflict settings, ignored
    begin: regex_begin
    name: regex_name

  nameend:  # conflict settings, ignored
    end: regex_end
    name: regex_name
cmhughes commented 3 years ago

Great, many thanks, that's great :)

I'll get this documented and released soon. Thanks so much!