Open akubot opened 3 years ago
That's certainly quite odd — does it reproduce if you simply run python -c 'import os.path; os.path.expanduser("data/blah/~$_-expect_fail.doc")'
? Looking at the stdlib implementation, my first reaction is surprise that this isn't failing the getpwnam()
lookup:
https://github.com/python/cpython/blob/3.8/Lib/posixpath.py#L228-L271
I modified your test slightly, it confirms the problem.
I changed it to expandvars
, which is the one failing.
and also wrapped it with a print()
The original command you posted does not throw an error but also doesn't print anything.
$ python3 -c 'import os.path; print(os.path.expandvars("data/blah/~$_-expect_fail.doc"))'
data/blah/~/home/users/jdoe-expect_fail.doc
I don't have an explanation for the issue, but somehow these 4 characters together expand on RHEL anyway.
Ah, I missed that and it explains the problem: $_
is a shell variable (last argument of the previous command, if memory serves), and that would explain why you're seeing this depending on the next character since it has to be something which stops parsing at the next character.
I'm wondering whether we want to remove that part of #88 on the theory that this is being overzealous. I do not want to be in the business of having to try to maintain a separate implementation of expandvars
.
It looks like it's slightly more subtle -- the sequence "$_-" also causes the problem
$ python3 -c 'import os.path; print(os.path.expandvars("data/blah/$_-expect_fail.doc"))'
data/blah/~/home/users/jdoe-expect_fail.doc
while just "$_" does not
$ python3 -c 'import os.path; print(os.path.expandvars("data/blah/$_expect_success.doc"))'
data/blah/$_expect_success.doc
and similarly for just "~$_"
python3 -c 'import os.path; print(os.path.expandvars("data/blah/~$_expect_success.doc"))'
data/blah/~$_expect_success.doc
So it appears the hyphen helps trigger this. But that makes me think there are other cases lurking.
Regarding the bagit.py source code, I'm not sure what the correct choice would be. I suppose you could put some guard around the expandvars
code, to check for the character sequences that mess it up on Linux?
But I'm not sure specifically what you would do, because you are currently matching a possibly expanded filename against the real one.
Probably better to discuss with regular contributors, and ask whether to back out the expandvars
snippet completely? Then it should always pass on Linux.
BTW, a Windows user verified this issue does not occur there, which makes sense because this is a Linux expansion.
I was thinking that it might be good to discuss this with @runderwood to see whether we have a compelling reason to keep expandvars
in here.
This variable is set by at least the popular Bourne shells and ones like Fish which try to be compatible so I suppose we could purge $_
at startup but that seems like it's just asking for some unexpected dependency to show up later.
I don't really recall why the expandvars
call is there, to be honest.
While I can see the reasoning that it's safer to disallow potentially dangerous variable expansion, I believe it's system-dependent so you don't really enforce a meaningful universal policy doing it this way.
I don't have a compelling argument for keeping it there if it's causing problems.
Observed on Linux, a failure at line 931 of bagit.py in git commit hash code
bff20d2
RHEL 7.5 (Maipo), Linux 3.10.0-1160.15.2.el7.x86_64 GNU/Linux
Python 3.7.4
using miniconda3File names containing this 4-character sequence cause an "is unsafe" false error due to expansion of the tilde:
~$_-
To reproduce, create a file name such as
data/blah/~$_-expect_fail.doc
In Python v3, do this
A similar test with filename that does not have the hyphen after the underscore
data/blah/~$_expect_success.doc
works OK, i.e. without unwanted expansion of the tilde.