dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.98k stars 4.66k forks source link

ZipFile.ExtractToDirectory failing with "The filename, directory name, or volume label syntax is incorrect." (works in SharpZipLib) #67201

Closed smoothdeveloper closed 2 years ago

smoothdeveloper commented 2 years ago

Description

I have some code using System.IO.Compression:

ZipFile.ExtractToDirectory(archive, Path.Combine(directory.FullName, "groove-v1.0.0-midionly"))

Reproduction Steps

putting this in a .fsx file and running it with dotnet fsi:

#r "nuget: System.IO.Compression"
open System
open System.IO
open System.IO.Compression
open System.Net.Http
let directory = DirectoryInfo __SOURCE_DIRECTORY__
let archive = Path.Combine(__SOURCE_DIRECTORY__, "groove-v1.0.0-midionly.zip")
do
    let uri = "https://storage.googleapis.com/magentadata/datasets/groove/groove-v1.0.0-midionly.zip"
    use client = new HttpClient(Timeout = TimeSpan.FromMinutes 30)
    use resp = client.Send(new HttpRequestMessage(RequestUri=Uri uri))
    use stream = resp.Content.ReadAsStream()
    use file = File.OpenWrite(archive)
    stream.CopyTo(file)
//let fz = ICSharpCode.SharpZipLib.Zip.FastZip()
//fz.ExtractZip(archive, Path.Combine(directory.FullName, filesetName), "")
// dotnet one fails one the groove dataset.
ZipFile.ExtractToDirectory(archive, Path.Combine(directory.FullName, "groove-v1.0.0-midionly"))

I tried on macos, and it doesn't seem to fail, so maybe it is platform dependent.

Expected behavior

I wish it would work, extracting without an exception.

Actual behavior

System.IO.IOException: The filename, directory name, or volume label syntax is incorrect. : 'C:\dev\src\github.com\smoothdeveloper\zcore-midi-fs\demo\groove-v1.0.0-midionly\groove\drummer8\session2\Icon ' at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options) at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize) at System.IO.Compression.ZipFileExtensions.ExtractToFile(ZipArchiveEntry source, String destinationFileName, Boolean overwrite) at System.IO.Compression.ZipFileExtensions.ExtractToDirectory(ZipArchive source, String destinationDirectoryName, Boolean overwriteFiles) at System.IO.Compression.ZipFile.ExtractToDirectory(String sourceArchiveFileName, String destinationDirectoryName, Encoding entryNameEncoding, Boolean overwriteFiles) at <StartupCode$FSI_0003>.$FSI_0003.main@() in

Regression?

No response

Known Workarounds

I am using FastZip class from ICSharpCode.SharpZipLib.Zip to work around the issue.

Configuration

Welcome to F# Interactive for .NET Core in Visual Studio. To execute code, either
  1. Use 'Send to Interactive' (Alt-Enter or right-click) from an F# script. The F# Interactive process will
     use any global.json settings associated with that script.
  2. Press 'Enter' to start. The F# Interactive process will use default settings.
> 

Microsoft (R) F# Interactive version 12.0.2.0 for F# 6.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

Other information

No response

Danyy427 commented 2 years ago

@danmoseley That didn't work. image

Danyy427 commented 2 years ago

@danmoseley Are we sure that #67189 includes the test data from runtime assets? the thread doesn't mention anything but Roslyn analyzers

danmoseley commented 2 years ago

Doh - you're right. It was https://github.com/dotnet/runtime/pull/67289, which just merged.

Hopefully you can build from the root again, and it will be much faster (incremental) and get you these. I'll do the same myself.

danmoseley commented 2 years ago

Confirmed that it is now available. You're unblocked 😄

Danyy427 commented 2 years ago

Okay thank you, let me see if I can get these tests to fail.

Danyy427 commented 2 years ago

@danmoseley Do I add both the tests and sanitization in one go or do I first add the tests, open a pull request, see that it fails and then add the sanitization to make them pass?

The tests are ready and they fail quite miserably as expected.

<test name="System.IO.Compression.Tests.ZipFile_Extract.Windows_ZipWithInvalidFileNames(zipFile: \&quot;InvalidWindowsFileNameChars.zip\&quot;, path: \&quot;Test______________________________________.txt\&quot;)" type="System.IO.Compression.Tests.ZipFile_Extract" method="Windows_ZipWithInvalidFileNames" time="0.0382547" result="Fail">
        <failure exception-type="System.IO.IOException">
          <message><![CDATA[System.IO.IOException : The filename, directory name, or volume label syntax is incorrect. : 'C:\\Users\\Asus\\AppData\\Local\\Temp\\ZipFile_Extract_w0e3npsy.frp\\Windows_ZipWithInvalidFileNames_114_debeea9f\\Test\"<>|\x01\x02\x03\x04\x05\x06\a\b\t\n\v\f\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f*?:\\']]></message>
          <stack-trace><![CDATA[   at System.IO.FileSystem.CreateDirectory(String fullPath, Byte[] securityDescriptor)
   at System.IO.Directory.CreateDirectory(String path)
   at System.IO.Compression.ZipFileExtensions.ExtractRelativeToDirectory(ZipArchiveEntry source, String destinationDirectoryName, Boolean overwrite) in D:\Runtime\runtime\src\libraries\System.IO.Compression.ZipFile\src\System\IO\Compression\ZipFileExtensions.ZipArchiveEntry.Extract.cs:line 111
   at System.IO.Compression.ZipFileExtensions.ExtractToDirectory(ZipArchive source, String destinationDirectoryName, Boolean overwriteFiles) in D:\Runtime\runtime\src\libraries\System.IO.Compression.ZipFile\src\System\IO\Compression\ZipFileExtensions.ZipArchive.Extract.cs:line 72
   at System.IO.Compression.ZipFile.ExtractToDirectory(String sourceArchiveFileName, String destinationDirectoryName, Encoding entryNameEncoding, Boolean overwriteFiles) in D:\Runtime\runtime\src\libraries\System.IO.Compression.ZipFile\src\System\IO\Compression\ZipFile.Extract.cs:line 188
   at System.IO.Compression.ZipFile.ExtractToDirectory(String sourceArchiveFileName, String destinationDirectoryName) in D:\Runtime\runtime\src\libraries\System.IO.Compression.ZipFile\src\System\IO\Compression\ZipFile.Extract.cs:line 43
   at System.IO.Compression.Tests.ZipFile_Extract.Windows_ZipWithInvalidFileNames(String zipFile, String path) in D:\Runtime\runtime\src\libraries\System.IO.Compression.ZipFile\tests\ZipFile.Extract.cs:line 115]]></stack-trace>
        </failure>
      </test>
Danyy427 commented 2 years ago

@danmoseley I have just finished the sanitization to GetFileName_Windows but I forgot to do it for directories. Is there a method like GetFileName_Windows for directories that you know of? I have been searching for it but I am not familiar with the project structure.

danmoseley commented 2 years ago

@danmoseley Do I add both the tests and sanitization in one go or do I first add the tests, open a pull request, see that it fails and then add the sanitization to make them pass?

It is always best to write the test(s) and verify it fails before attempting to fix anything. Primarily because that confirms you have written a useful test.

I have just finished the sanitization to GetFileName_Windows

I didn't look at the code very hard 😄 It seems the purpose of that method is to find the "file part" of the entry. It's then exposed eg in ZipArchiveEntry.Name. We probably shouldn't "lie" there.

My suggestion is to add the sanitization at the point the file has to be written. It looks like here, FullName is the whole relative path. https://github.com/dotnet/runtime/blob/dbef8db7c4e1d20dab771f234441c91770f66a58/src/libraries/System.IO.Compression.ZipFile/src/System/IO/Compression/ZipFileExtensions.ZipArchiveEntry.Extract.cs#L99

One way to do this might be to add something like an "internal string SantizedFullName { get { .. }}" property (or maybe FileSystemSafeFullName?)to both ZipArchiveEntry.Windows.cs and ZipArchiveEntry.Unix.cs. On Unix it just returns FullName. On Windows it sanitizes it. Then on the line above, use SanitizedFullName instead of FullName.

Danyy427 commented 2 years ago

@danmoseley I have created a pull requests for the test. #67332

Danyy427 commented 2 years ago

@danmoseley checks are done, some tests failed as expected

Danyy427 commented 2 years ago

@danmoseley In your last post, you said

One way to do this might be to add something like an "internal string SantizedFullName { get { .. }}" property (or maybe FileSystemSafeFullName?)to both ZipArchiveEntry.Windows.cs and ZipArchiveEntry.Unix.cs. On Unix it just returns FullName. On Windows it sanitizes it. Then on the line above, use SanitizedFullName instead of FullName.

I do not understand why ValidFullName shouldn't be public. Aren't we trying to reach it outside of its assembly? One is from System.IO.Compression.Zipfile the other System.IO.Compression, am I misinterpreting how internal works.

Danyy427 commented 2 years ago

@danmoseley Also should I build the System.IO.Compression again before using ValidFullName from ZipFileExtensions.ZipArchiveEntry.Extract.cs?

When ValidFullName is internal and I build the project, I still cannot use it from ZipArchiveEntry.cs, I am getting a CS1061

On the other hand if I declare ValidFullName as public the library won't build at all,

C:\Users\Asus\.nuget\packages\microsoft.dotnet.apicompat\7.0.0-beta.22171.2\build\Microsoft.DotNet.ApiCompat.targets(94,5): error : Compat issues with assembly System.IO.Compression: [D:\Runtime\runtime\src\libraries\System.IO.Compression\src\System.IO.Compression.csproj]
C:\Users\Asus\.nuget\packages\microsoft.dotnet.apicompat\7.0.0-beta.22171.2\build\Microsoft.DotNet.ApiCompat.targets(94,5): error : MembersMustExist : Member 'public System.String System.IO.Compression.ZipArchiveEntry.ValidFullName.get()' does not exist in the reference but it does exist in the implementation. [D:\Runtime\runtime\src\libraries\System.IO.Compression\src\System.IO.Compression.csproj]
C:\Users\Asus\.nuget\packages\microsoft.dotnet.apicompat\7.0.0-beta.22171.2\build\Microsoft.DotNet.ApiCompat.targets(94,5): error : Compat issues with assembly System.IO.Compression: [D:\Runtime\runtime\src\libraries\System.IO.Compression\src\System.IO.Compression.csproj]
C:\Users\Asus\.nuget\packages\microsoft.dotnet.apicompat\7.0.0-beta.22171.2\build\Microsoft.DotNet.ApiCompat.targets(94,5): error : MembersMustExist : Member 'public System.String System.IO.Compression.ZipArchiveEntry.ValidFullName.get()' does not exist in the reference but it does exist in the implementation. [D:\Runtime\runtime\src\libraries\System.IO.Compression\src\System.IO.Compression.csproj]
C:\Users\Asus\.nuget\packages\microsoft.dotnet.apicompat\7.0.0-beta.22171.2\build\Microsoft.DotNet.ApiCompat.targets(94,5): error : Compat issues with assembly System.IO.Compression: [D:\Runtime\runtime\src\libraries\System.IO.Compression\src\System.IO.Compression.csproj]
C:\Users\Asus\.nuget\packages\microsoft.dotnet.apicompat\7.0.0-beta.22171.2\build\Microsoft.DotNet.ApiCompat.targets(94,5): error : MembersMustExist : Member 'public System.String System.IO.Compression.ZipArchiveEntry.ValidFullName.get()' does not exist in the reference but it does exist in the implementation. [D:\Runtime\runtime\src\libraries\System.IO.Compression\src\System.IO.Compression.csproj]
danmoseley commented 2 years ago

Ah. It looks like you can't piggy back in ZipArchiveEntry.Windows.cs (which is in System.IO.Compression) because you need it in ZipFileExtensions.ZipArchiveEntry.Extract.cs (which is in System.IO.Compression.ZipFile)

Instead I suggest creating something like ZipFileExtensions.ZipArchiveEntry.Extract.Windows.cs and putting it next to ZipFileExtensions.ZipArchiveEntry.Extract.cs and editing System.IO.Compression.ZipFile.csproj so it's only used on Windows. Hmm, but that csproj does not have a Windows-specific build. It would need $(NetCoreAppCurrent)-windows added to <TargetFrameworks> so you can use Condition="'$(TargetPlatformIdentifier)' == 'windows'" to include this new file.

I'm not familiar with all these arrangements and why bits of zip are in S.IO.C and bits are in S.IO.C.Z.. @carlossanlop do you have a suggestion?

Danyy427 commented 2 years ago

@danmoseley Instead of adding ValidFileName to ZipArchiveEntry.Windows.cs and ZipArchiveEntry.Unix.cs separately, could I add it to ZipArchiveEntry.cs directly and check for the platform inside of get. If it is windows I sanitize the characters, if not I sanitize \0 or just return FileName as is?

danmoseley commented 2 years ago

Ideally we don't check OS at runtime.

@ericstj what is our base for adding a windows target to this library?

Danyy427 commented 2 years ago

@danmoseley I don't know if it would work (I cannot test it right now) but we could maybe add _ValidFullName to ZipArchiveEntry.Windows.cs and ZipArchiveEntry.Unix.cs, declare them as internal, then declare a public property ValidFullName which just returns _ValidFullNamein ZipArchiveEntry.cs

carlossanlop commented 2 years ago

Ah. It looks like you can't piggy back in ZipArchiveEntry.Windows.cs (which is in System.IO.Compression) because you need it in ZipFileExtensions.ZipArchiveEntry.Extract.cs (which is in System.IO.Compression.ZipFile)

If I'm understanding this correctly, you're trying to consume code from System.IO.Compression.ZipFile in System.IO.Compression. Correct? One option is to put the shared code in a new file in src/libraries/Common/System/IO/Compression, and then include that file in both S.IO.Compression and S.IO.Compression.ZipFile csproj projects. As long as the code is internal, it should work for both.

A great example of this practice are the src/libraries/Common/src/System/IO/PathInternal*.cs files, which are consumed in a few different assemblies.

Danyy427 commented 2 years ago

@carlossanlop We are also trying to make some of the code work specifically on Windows and some of it on Unix, like ZipArchiveEntry.Windows.cs and ZipArchiveEntry.Unix.cs which should also be a common code in Compression and Compression.Zipfile

ericstj commented 2 years ago

@ericstj what is our base for adding a windows target to this library?

Add $(NetCoreAppCurrent)-windows to this https://github.com/dotnet/runtime/blob/631389fb941eb618fef883975ca5f4f5b03a93cf/src/libraries/System.IO.Compression.ZipFile/src/System.IO.Compression.ZipFile.csproj#L4

and make sure you condition your new files on TargetPlatformIdentifier like this: https://github.com/dotnet/runtime/blob/20e8f905d05f2d8f390d0e2ef10f1a4a1289967c/src/libraries/System.IO.Compression/src/System.IO.Compression.csproj#L47-L50

Danyy427 commented 2 years ago

@danmoseley I do not know how I should be using this knowledge. I tried to do something but it's just errors everywhere now :)

I added two files ZipFileValidName_Windows.cs and ZipFileValidName_Unix.cs to src/libraries/Common/System/IO/Compression, added both of the files to their relative target platforms in the csproj files. Like this:

System.IO.Compression.sln:

<!-- Windows specific files -->
...
<Compile Include="$(CommonPath)System\IO\Compression\ZipFileValidName_Windows.cs"
             Link="Common\System\IO\Compression\ZipFileValidName_Windows.cs" />

<!-- Unix specific files -->
...
<Compile Include="$(CommonPath)System\IO\Compression\ZipFileValidName_Unix.cs"
             Link="Common\System\IO\Compression\ZipFileValidName_Unix.cs" />

and to System.IO.Compression.ZipFile I first added $(NetCoreAppCurrent)-windows; to TargetFrameworks then added the files:

  <ItemGroup Condition="'$(TargetPlatformIdentifier)' == 'windows'">
    <Compile Include="System\IO\Compression\ZipArchiveEntry.Windows.cs" />
    <Compile Include="$(CommonPath)Interop\Windows\Interop.Libraries.cs"
             Link="Common\Interop\Windows\Interop.Libraries.cs" />
    <Compile Include="$(CommonPath)System\IO\Compression\ZipFileValidName_Windows.cs"
             Link="Common\System\IO\Compression\ZipFileValidName_Windows.cs" />
  </ItemGroup>

<!-- Unix specific files -->
...
<Compile Include="$(CommonPath)System\IO\Compression\ZipFileValidName_Unix.cs"
             Link="Common\System\IO\Compression\ZipFileValidName_Unix.cs" />

Finally changed the FullName to ValidFullName in ZipExtensions.ZipArchiveEntry.Extract.cs but now every ZipArchiveEntry in the file is giving a CS0436 error.

danmoseley commented 2 years ago

@Danyy427 can you push your changes to your branch, and one of us can take a look?

Danyy427 commented 2 years ago

@danmoseley I pushed my changes, note that some weird character was written at the start of the csproj files, probably something wrong with vs or notepad++, the last commit was to remove those

Danyy427 commented 2 years ago

@danmoseley I have also added sanitization.

Danyy427 commented 2 years ago

@danmoseley did you have a chance to look at the changes?

Danyy427 commented 2 years ago

@danmoseley Actually instead of trying to add a ValidFullName to S.IO.C and trying to use it from S.IO.C.Z, why don’t we just write platform dependent methods to S.IO.C.Z and avoid the shared code part altogether. Is there a particular reason why we are trying to avoid writing methods to sanitize the filename?

danmoseley commented 2 years ago

@carlossanlop do you have a preference?

Danyy427 commented 2 years ago

@danmoseley I was going to try out writing sanitization as a method and I did do it for windows and Unix but there seem to be 4 target platforms. It seems that Unix and Browser go together but what about net7.0 without any target platform specified. Is that supposed to have sanitization or not?

Danyy427 commented 2 years ago

@danmoseley I made the test that I have written pass locally using a method to sanitize the filename. Every other test fails miserably tho

Danyy427 commented 2 years ago

@danmoseley I have pushed the changes that made the local test pass, how should I deal with tests that expect the extraction to fail when there are invalid characters in the filename?

danmoseley commented 2 years ago

how should I deal with tests that expect the extraction to fail when there are invalid characters in the filename?

You'd update those tests so they expect the new behavior.

Danyy427 commented 2 years ago

@danmoseley there are some tests that are made specifically to check for errors when they encounter invalid chars. Should I delete those tests as they are no longer needed or should I make sure all of the zips in their test cases are extracted correctly with Assert.True's. Like for example Windows_ZipWithInvalidFileNames_ThrowsException in Zipfile.Create tests

Danyy427 commented 2 years ago

@danmoseley I fixed all of the tests and they all passed on my local system. Instead of deleting the ThrowsException tests, I merged the entirety of them into my own test function that I have written initially so we can test all of the invalid and null zip file test cases. I have pushed the code to #67332 and if they pass here as well they should be good to go.

danmoseley commented 2 years ago

fixed by https://github.com/dotnet/runtime/pull/67332