google-research / arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Apache License 2.0
5.23k stars 327 forks source link

Bug fix 48: Add optional argument functionality for --commands_to_delete #89

Closed dylduhamel closed 8 months ago

dylduhamel commented 8 months ago

This change resolves #48.

I have modified the base_pattern regex to include (?:\[(?:.*?)\])* which is a non-greedy match for zero or more occurrences of brackets that serve as optional arguments in LaTeX.

I have also added

def extract_text_inside_curly_braces(text):
    """Extract text inside of {} from command string"""
    pattern = r"\{((?:[^{}]|(?R))*)\}"

    match = regex.search(pattern, text)

    if match:
      return match.group(1)
    else:
      return ''

which serves to extract the text from nested or non-nested commands if keep_text is set to true.

Tests to ensure proper functionality have also been added!

jponttuset commented 8 months ago

Thank you very much @dylduhamel! When I saw the pattern r'(?:\[(?:.*?)\])*\{((?:[^{}]+|\{(?1)\})*)\}(?:\[(?:.*?)\])*' I couldn't resist thinking: Isn't regex a lovely intuitive language? 😝