brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
957 stars 101 forks source link

allow () paren chars in defensive path check #2295

Closed dginev closed 10 months ago

dginev commented 10 months ago

Fixes #2293 .

This PR is an alternative - or supplemental - approach to #2294. Here, I still keep the defensive checks as they are, but generally allow the () characters to be present. There may be a good case to merge both PRs.

I also started wondering how many other subtle cases may be silently prevented due to the pathname safety check, so I added a warning. Would be curious to see what variations we see in the next arXiv rerun.

xworld21 commented 10 months ago

() are special characters for POSIX shells, so pathname_is_nasty is correct in rejecting those before calling pathname_kpsewhich (in fact, it may be time to move the defensive checks inside pathname_kpsewhich – or even better, call kpsewhich in a safe way instead of using backticks).

xworld21 commented 10 months ago

call kpsewhich in a safe way instead of using backticks

PS: I know how to do this (requires Perl >=v5.8, which I am sure is fine, plus a small extra dependency on Windows because... Windows). I'll send a PR soon.

xworld21 commented 10 months ago

...

PS: I know how to do this (requires Perl >=v5.8, which I am sure is fine, plus a small extra dependency on Windows because... Windows). I'll send a PR soon.

Scratch that: I have already done that! See https://github.com/brucemiller/LaTeXML/blob/169d04c9743c5deac5309bef496029f172dd0afe/lib/LaTeXML/Util/Pathname.pm#L403 The open call uses the list form, which does not spawn a shell, and is thus safe in the face of arbitrary file names. Well, except for Windows – let me send a Windows-specific PR.

So it's actually safe to remove pathname_is_nasty altogether, modulo the Windows change.

dginev commented 10 months ago

Right, the nasty check has been trying to reject both malformed and suspect paths historically. I don't think we want to accept say a path

$(dirname $(kpsewhich latex.ltx))/..

but your issue rightfully pointed out that parens on their own are acceptable literal characters, as with article/figures (1200x1800px). The parens would have to be properly escaped if any system calls are made, agreed. But I think that was already the case for paths with spaces in the name, which are also allowed.

dginev commented 10 months ago

Removing reliance on pathname_is_nasty entirely could be seen as code cleanup (since regex checks are always fishy), so that sounds appealing to me.

If we have a consensus way of doing that, might as well?

xworld21 commented 10 months ago

But I think that was already the case for paths with spaces in the name, which are also allowed

I think spaces didn't work before #1592. Before the PR, kpsewhich was called with backticks and no attempt at escaping: arguments with spaces were safe, but the behaviour was broken.

If we have a consensus way of doing that, might as well?

Let me add: after adding proper escaping on Windows.