dotnet / sdk-container-builds

Libraries and build tooling to create container images from .NET projects using MSBuild
https://learn.microsoft.com/en-us/dotnet/core/docker/publish-as-container
MIT License
179 stars 34 forks source link

Which tar format is best? #159

Open rainersigwald opened 2 years ago

rainersigwald commented 2 years ago

Question semi-unrelated to this change: @rainersigwald any reason why GNU format was used?

My assumption is that PAX was not used so that file size could be reduced, since it always created an extended attributes entry to precede each actual entry. That's fine, but just wanted to mention that PAX is the most flexible format, so if you encounter any issues, you can always try that format.

_Originally posted by @carlossanlop in https://github.com/dotnet/sdk-container-builds/pull/153#discussion_r960847413_

rainersigwald commented 2 years ago

It's because I was just guessing here! @carlossanlop is there a way to tell, given a .tar file, what format it uses? I'd like to match whatever docker does, but I couldn't figure that out when I tried--and I forgot to file an issue for follow up.

carlossanlop commented 2 years ago

is there a way to tell, given a .tar file, what format it uses?

An archive can have entries of different formats intermixed. So the format belongs to an entry itself, not to the whole archive.

but I couldn't figure that out when I tried

For reading: After collecting the next entry of an archive with GetNextEntry or GetNextEntryAsync, you can consult what format the entry by reading the .Format property in the returned TarEntry.

For writing: The TarWriter class has a constructor variant that takes a TarEntryFormat argument to specify the default format to use when writing entries using the WriteEntry(string fileName, string entryName) or WriteEntryAsync(string fileName, string entryName, CancellationToken cancellationToken) methods.

It's because I was just guessing here!

The GNU format is used as default for some tar tools, but it has the following limitations:

I prefer PAX because:

And since I mentioned that you can intermix formats, you could add logic to decide if you want to save some entries in GNU if they don't have any size restrictions, and the rest that surpass the limits can be stored in PAX.

Hope that helps.