JuliaIO / Tar.jl

TAR files: create, list, extract them in pure Julia
MIT License
79 stars 19 forks source link

Check for bad Windows paths prior to extraction; allow check on creation #146

Closed StefanKarpinski closed 1 year ago

StefanKarpinski commented 1 year ago

Check for bad Windows paths prior to extraction; allow check on creation Previously, we didn't bat an eye at extracting paths that are considered bad or illegal on Windows. This explicitly checks for such paths before attempting extraction on Windows. On all platforms it introduces the portable flag for tarball creating operations, which does the same checks and errors if you would have created a tarball that would error upon extraction on Windows.

codecov[bot] commented 1 year ago

Codecov Report

Base: 97.31% // Head: 97.37% // Increases project coverage by +0.06% :tada:

Coverage data is based on head (d7f8b3d) compared to base (951955b). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #146 +/- ## ========================================== + Coverage 97.31% 97.37% +0.06% ========================================== Files 4 4 Lines 783 801 +18 ========================================== + Hits 762 780 +18 Misses 21 21 ``` | [Impacted Files](https://codecov.io/gh/JuliaIO/Tar.jl/pull/146?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO) | Coverage Δ | | |---|---|---| | [src/Tar.jl](https://codecov.io/gh/JuliaIO/Tar.jl/pull/146/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL1Rhci5qbA==) | `91.91% <100.00%> (+1.11%)` | :arrow_up: | | [src/create.jl](https://codecov.io/gh/JuliaIO/Tar.jl/pull/146/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2NyZWF0ZS5qbA==) | `100.00% <100.00%> (ø)` | | | [src/extract.jl](https://codecov.io/gh/JuliaIO/Tar.jl/pull/146/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO#diff-c3JjL2V4dHJhY3Quamw=) | `98.14% <100.00%> (+<0.01%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaIO)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

StefanKarpinski commented 1 year ago

The rules are based on https://stackoverflow.com/a/31976060/659248. @vtjnash, do you have an opinion about whether we should actively prevent people form making paths with these names on Windows?

StefanKarpinski commented 1 year ago

Some of them seem pretty much like we have to: if you have a file named foo\bar in a tarball, if you try to extract it on Windows you will not get a file with that name, you'll get a directory named foo with a file inside of it named bar, which is not what should happen. There's simply no way to get a file named foo\bar since \ is a path separator on Windows. With some of the other ones, I'm not sure—the docs say it's a bad idea to create files with some of these names but also says it can be done programmatically. Not sure where to draw the line.

vtjnash commented 1 year ago

I think it makes sense to check. It is a bit of a risk to have files that could actually control physical hardware and cause the printer to operate. Similar to the (hypothetically legal) operation to specify an absolute path for the extraction destination (or equivalently to create a symlink to somewhere else like /usr, then extract files through that symlink, or through the .. symlink–all of which are normally rejected by bsdtar and gnutar on extract, even though they are legal)

It may also make sense for portable to check for case-clashes within a set of paths? But that can be a separate issue opened after merging this.

StefanKarpinski commented 1 year ago

It may also make sense for portable to check for case-clashes within a set of paths? But that can be a separate issue opened after merging this.

Oh, that's a good idea! We have had issues with people generating artifacts on Linux with case clashes that then causes problems on MacOS and Windows.

StefanKarpinski commented 1 year ago

I think it makes sense to check. It is a bit of a risk to have files that could actually control physical hardware and cause the printer to operate. Similar to the (hypothetically legal) operation to specify an absolute path for the extraction destination (or equivalently to create a symlink to somewhere else like /usr, then extract files through that symlink, or through the .. symlink–all of which are normally rejected by bsdtar and gnutar on extract, even though they are legal)

That makes sense. We do already prevent attacks that can overwrite files outside of the extraction root and preventing this kind of thing seems like a good idea as well. I think we'd want to turn the portable option on when we uses Tar.rewrite to normalize package and artifact tarballs (cc @staticfloat), which would prevent people from creating tarballs that could cause problems on Windows machines. We could potentially not check for portability for platform-specific tarballs that won't be installed on Windows machines.

StefanKarpinski commented 1 year ago

A couple more things I'm left wondering about...

  1. Is a name like LPT. ok? Previously the trailing dot rule excluded it even if the LPT.* exclusion didn't apply. If we're ok with names ending in dots though I'm not sure if LPT. is a problem or not. Do you know if Windows parses that as an LPT file with an empty extension?
  2. Which of the invalid characters in "*:<>?\\| are actually that bad? Clearly \ can't be allowed as discussed before and : seems like it has similar issues, at least in the first part of a path (is it ok in later parts where it can't be interpreted a part of a drive name?). But * and ? seem to just be excluded because Windows made the poor choice of handling wildcard expansion in the kernel instead of the shell, but it seems like you can programmatically create and work with files with those characters in their names, it just might be hard to interact with them from a shell. Not sure about the remaining <>|.
vtjnash commented 1 year ago

I think those are all disallowed at the Win32 library layer and require \? paths to bypass it. I didn’t check though

vtjnash commented 1 year ago

Canonical reference for these rules: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

Do not end a file or directory name with a space or a period. Although the underlying file system may support such names, the Windows shell and user interface does not.

StefanKarpinski commented 1 year ago

That page doesn't really spell out what bad stuff (if any) happens if you try to name files using the "reserved characters". If it's not a security hazard to have files with this characters in their name, it might be better to just let the operating/file system object if it really wants to and otherwise let the files be created. The main use case for this, after all, is to create artifacts and package trees and it's unlikely that a user will directly interact with the contents. On the other hand, they are weird characters to use in file names.

StefanKarpinski commented 1 year ago

We could just use \?\\ paths for all access here (arguably we could do that everywhere to better match how UNIX systems behave, or just do it when a path has weird characters in it).

StefanKarpinski commented 1 year ago

@staticfloat What do you think of this? Can we scan existing packages and artifacts to see how many would be affected if we started erroring on these things on Windows? Hopefully none, but you next know. Some of these names are awfully basic...

vtjnash commented 1 year ago

The documented list of malicious paths is apparently missing many things, according to https://googleprojectzero.blogspot.com/2016/02/the-definitive-guide-on-win32-to-nt.html