alvinwan / TexSoup

fault-tolerant Python3 package for searching, navigating, and modifying LaTeX documents
https://texsoup.alvinwan.com
BSD 2-Clause "Simplified" License
290 stars 43 forks source link

Mismatched brackets causing parsing error and Percentage Signs '%' in links cause Malformed Argument (+Solution) #144

Open AlexAWyatt opened 1 year ago

AlexAWyatt commented 1 year ago

Hi Alvin,

I've written an external solution to both of these problems (problems detailed further below).

Right now, my solution is contained here: Repo with Fix

I'm not sure the best way to do this, I didn't want to insert the solution into your package since it works outside of your program and is unnecessary in many use cases, but if there's a way you would like it included let me know and I can implement.

Problem 1 (mismatched brackets):

I think you're familiar with mismatched brackets (ie: [ x , y ) ) causing errors, so I won't explain the problem too deep. The linked solution produces a new LaTeX document that renders the same as the input document, but can now be parsed correctly by TexSoup.

Example: Input Text: $(0, 32]$

This would cause an error as TexSoup views the '(' bracket as never closing. To fix we produce the following, which renders the same in LaTeX but is also successfully parsed by TexSoup using the properties of 'BraceGroups'

Output Text: ${(0,32]}$

Problem 2 (links with percentage signs):

Links with percentage signs are not correctly parsed as '%' is almost always seen as the beginning of a comment and thus new token.

Example: Input Text: \url{en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory}

Causes a 'Malformed Argument' error as the link is parsed as follows: \url '{' 'en.wikipedia.org/wiki/Zermelo', '%E2%80%93Fraenkel_set_theory}'

With the fix, again, the LaTeX document will render the same, but is now successfully parsed by TexSoup

Output Text: \url{en.wikipedia.org/wiki/Zermelo\%E2\%80\%93Fraenkel_set_theory}

Note: these solutions make minor changes to the original LaTeX document, but these changes are in-line with how LaTeX is interpreted when rendering documents.

I've also included tests with the same directories and files as you've used in your tests, and fully documented the files. I hope it's to your liking when you have a chance to take a look.

Extra redundant repo link: Repo with Fix

harzer99 commented 6 months ago

Ran into the same issue with the percentage signs. Thanks for documenting the fix :)