cmhughes / latexindent.pl

Perl script to add indentation (leading horizontal space) to LaTeX files. It can modify line breaks before, during and after code blocks; it can perform text wrapping and paragraph line break removal. It can also perform string-based and regex-based substitutions/replacements. The script is customisable through its YAML interface.
GNU General Public License v3.0
884 stars 84 forks source link

Multiplespace #350

Closed xyecoding closed 2 years ago

xyecoding commented 2 years ago

Please provide the following when posting an issue:

original .tex code

This is the (here are mutiple spaces) second sentence. image

yaml settings

indentRules: myenvironment: "\t\t" anotherenvironment: "\t\t\t\t" chapter: " " section: " " item: " " myitem: " " defaultIndent: " " modifyLineBreaks: preserveBlankLines: 0 condenseMultipleBlankLinesInto: 1 oneSentencePerLine: manipulateSentences: 1 removeSentenceLineBreaks: 0 textWrapSentences: 0 sentenceIndent: "" sentencesFollow: par: 1 blankLine: 1 fullStop: 1 exclamationMark: 1 questionMark: 1 rightBrace: 1 commentOnPreviousLine: 1 other: 0 sentencesBeginWith: A-Z: 1 a-z: 0 other: 0 sentencesEndWith: basicFullStop: 0 betterFullStop: 0 exclamationMark: 1 questionMark: 1 other: ".\ (?=[A-Z])" textWrapOptions: columns: 80 multipleSpacesToSingle: 1 huge: overflow # forbid mid-word line breaks separator: "" perCodeBlockBasis: 0 beforeFindingChildCodeBlocks: 0 all: 0 alignAtAmpersandTakesPriority: 1 environments: quotation: 0 ifElseFi: 0 optionalArguments: 0 mandatoryArguments: 0 items: 0 specialBeginEnd: 0 afterHeading: 0 preamble: 0 filecontents: 0 mainDocument: 0

actual/given output

image

desired or expected output

image

-- Please put any comments or anything else here :) In the ducoment, latexindent.pl is default to turn multiple spaces into one space. However, it can't turn them into one space even when i add multipleSpacesToSingle: 1 in the config file. Above, I pasted something about spaces and tab in my config file.

Thanks for help!

cmhughes commented 2 years ago

I need text files, not screen shots. Minimal yaml, please.

xyecoding commented 2 years ago

I need text files, not screen shots. Minimal yaml, please.

I directly modify the defaultSettings.yaml file in the home path of latexindelt.pl. My version of defaultSetting.yaml can be viewed here.

The tex file only contains one sentence, which is "This is the        second sentence".

cmhughes commented 2 years ago

So you want to convert the multiple spaces into a single space...?

xyecoding commented 2 years ago

So you want to convert the multiple spaces into a single space...?

Yes, I set multipleSpacesToSingle: 1 in my config file. However, it does not work.

cmhughes commented 2 years ago

Ok, thanks.

Note the following:

So, if we use the following:

xyegithub.yaml

modifyLineBreaks:
    textWrapOptions:
        columns: 80

and

myfile.tex

This is the        second sentence

and then call

latexindent.pl -l xyegithub.yaml -m myfile.tex

then the output is

This is the second sentence
cmhughes commented 2 years ago

Let me know if the above needs anything further :)

xyecoding commented 2 years ago

Let me know if the above needs anything further :)

Yes. Sorry, I am cropping with something else these days. I added -m. However, the issue still remain.

My tex file, My Config file

I use the command latexindent.pl -l xy.yaml -m my_file.tex. This is what i get image. The log file

cmhughes commented 2 years ago

Are you sure your distribution is up to date? You'll need V3.16. Also ensure that you have saved the YAML file above appropriately.

xyecoding commented 2 years ago

Are you sure your distribution is up to date? You'll need V3.16. Also ensure that you have saved the YAML file above appropriately.

Thank you! My version is V3.15. I update to V3.16, and it works correctly.

cmhughes commented 2 years ago

Great :)

On Sun, 20 Mar 2022, 08:40 xyegithub, @.***> wrote:

Are you sure your distribution is up to date? You'll need V3.16. Also ensure that you have saved the YAML file above appropriately.

Thank you! My version is V3.15. I update to V3.16, and it works correctly.

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/350#issuecomment-1073199646, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYF6UVFUDLS4EHNEM7TVA3P7XANCNFSM5RAP4INA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you modified the open/close state.Message ID: @.***>

xyecoding commented 2 years ago

Let me know if the above needs anything further :)

I find that in the condition manipulateSentences: 1 enabling sentences manipulation, it fails to turn multiple spaces into one. image my_file: This is the      second sentence xy_mps0.yaml

modifyLineBreaks:
  oneSentencePerLine:
    manipulateSentences: 0
  textWrapOptions:
    columns: 80
    multipleSpacesToSingle: 1

xy_mps1.yaml

modifyLineBreaks:
  oneSentencePerLine:
    manipulateSentences: 1
  textWrapOptions:
    columns: 80
    multipleSpacesToSingle: 1
cmhughes commented 2 years ago

As of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17, the multipleSpacesToSingle also applies to the oneSentencePerLine routine, so you now receive the same output regardless of which file you use :)

xyecoding commented 2 years ago

As of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17, the multipleSpacesToSingle also applies to the oneSentencePerLine routine, so you now receive the same output regardless of which file you use :)

Thany you! Great to hear that!

xyecoding commented 2 years ago

As of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17, the multipleSpacesToSingle also applies to the oneSentencePerLine routine, so you now receive the same output regardless of which file you use :)

The manipulateSentences = 1 seems also can not work with textWrapOptions: columns: 80.

cmhughes commented 2 years ago

Specific example please. Have you seen

https://github.com/cmhughes/latexindent.pl/issues/355

On Tue, 29 Mar 2022, 02:26 xyegithub, @.***> wrote:

As of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17, the multipleSpacesToSingle also applies to the oneSentencePerLine routine, so you now receive the same output regardless of which file you use :)

The manipulateSentences = 1 seems also can not work with textWrapOptions: columns: 80.

— Reply to this email directly, view it on GitHub https://github.com/cmhughes/latexindent.pl/issues/350#issuecomment-1081305680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYAW4PIJPLUIM3AEBSTVCJL23ANCNFSM5RAP4INA . You are receiving this because you modified the open/close state.Message ID: @.***>

xyecoding commented 2 years ago

Specific example please. Have you seen #355 On Tue, 29 Mar 2022, 02:26 xyegithub, @.> wrote: As of https://github.com/cmhughes/latexindent.pl/releases/tag/V3.17, the multipleSpacesToSingle also applies to the oneSentencePerLine routine, so you now receive the same output regardless of which file you use :) The manipulateSentences = 1 seems also can not work with textWrapOptions: columns: 80. — Reply to this email directly, view it on GitHub <#350 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ7CYAW4PIJPLUIM3AEBSTVCJL23ANCNFSM5RAP4INA . You are receiving this because you modified the open/close state.Message ID: @.>

The tex file

Dan \textit{et al.} proposed multi-column networks composed of multiple branches of the same topology and trained them on the ``up-right'' and rotated images to obtain the rotation invariant features.
Three slightly different architectures were tested for OACNNs, OACNN (tsf), OACNN (tsf, A.S.), and OACNN (A.S.).

The config file

modifyLineBreaks:
  oneSentencePerLine:
    manipulateSentences: 1
    removeSentenceLineBreaks: 0
    textWrapSentences: 1 # setting to 1 disables main textWrap routine
  textWrapOptions:
    columns: 80
    multipleSpacesToSingle: 1

The output

Dan \textit{et al.
} proposed multi-column networks composed of multiple branches of the same topology and trained them on the ``up-right'' and rotated images to obtain the rotation invariant features.
Three slightly different architectures were tested for OACNNs, OACNN (tsf),
OACNN (tsf, A.S.)
, and OACNN (A.
S.)
.

I think it is an issue of betterFullStop, which regard something do not end the sentence as the end, such as .}.

I guess an . followed by one or multiple spaces and an upper case would be good to be an end of sentence?

cmhughes commented 2 years ago

Try the following

modifyLineBreaks:
  oneSentencePerLine:
    manipulateSentences: 1
    textWrapSentences: 1
  textWrapOptions:
    columns: 80

fineTuning:
    modifyLineBreaks:
      betterFullStop: |-
        (?x)
        (?:\.\)(?!\h*(?:,|\.|[a-z])))           # .) not followed by a-z
             |                                  # OR
            (?:                                 #
              (?<!                              # Not *preceeded by*
                (?:                             #
                  (?:[eE]\.g)                   # e.g
                  |                             #
                  (?:[iI]\.e)                   # i.e
                  |                             #
                  (?:etc)                       # etc
                )                               #
              )                                 #
            )                                   # 
            \.                                  # .
            (?!                                 # Not *followed by*
              (?:                               #
                [a-zA-Z0-9]                     #
                |                               #
                \-                              #
                |                               #
                ~                               #
                |                               #
                \,                              #
                |                               #
                \}                              #  <!---- NEW BIT
                |                               #
                \),                             #  <!---- NEW BIT
                |                               #
                \)\.                            #  <!---- NEW BIT
              )                                 #
            )                                   #

which gives

Dan \textit{et al.} proposed multi-column networks composed of multiple
branches of the same topology and trained them on the ``up-right'' and rotated
images to obtain the rotation invariant features.
Three slightly different architectures were tested for OACNNs, OACNN (tsf),
OACNN (tsf, A.S.), and OACNN (A.S.).

For future, if you have different issues, please study the documentation, and then post an issue using the template, and plain text, please (no screenshots!). Thanks!

xyecoding commented 2 years ago
      betterFullStop: |-
        (?x)
        (?:\.\)(?!\h*(?:,|\.|[a-z])))           # .) not followed by a-z
             |                                  # OR
            (?:                                 #
              (?<!                              # Not *preceeded by*
                (?:                             #
                  (?:[eE]\.g)                   # e.g
                  |                             #
                  (?:[iI]\.e)                   # i.e
                  |                             #
                  (?:etc)                       # etc
                )                               #
              )                                 #
            )                                   # 
            \.                                  # .
            (?!                                 # Not *followed by*
              (?:                               #
                [a-zA-Z0-9]                     #
                |                               #
                \-                              #
                |                               #
                ~                               #
                |                               #
                \,                              #
                |                               #
                \}                              #  <!---- NEW BIT
                |                               #
                \),                             #  <!---- NEW BIT
                |                               #
                \)\.                            #  <!---- NEW BIT
              )                                 #
            )                                   #

OK, it works well! Thank you!