Closed GoogleCodeExporter closed 9 years ago
If interested, here's a couple of Stack Exchange questions/answers that used
The Unarchiver and The Archive Browser for testing.
* How to add filename to archive if compressing using Gzip class?, http://stackoverflow.com/q/27739140/608639
* Is Gzip supposed to honor original filename during decompress?, http://superuser.com/q/859785/173513
The incorrect results muddied the waters while investigating the issue on the
Stack Exchange questions, and caused us to ask "... am I seeing a bug in three
different programs?".
Original comment by noloa...@gmail.com
on 3 Jan 2015 at 11:08
This is intentional. Too many gzip filenames apparently have invalid or
incorrect filenames stored, which breaks decompression of embedded tar
archives. There used to be support, but it caused too many issues.
Original comment by paracel...@gmail.com
on 4 Jan 2015 at 8:32
> Too many gzip filenames apparently have invalid or incorrect filenames...
Thanks for that. I was not aware the problem was that widespread.
Related, is the following be sufficient in my software? According to RFC 1952,
the character set is Latin-1 of ISO/IEC 8859-1. The collection of 191 valid
characters came from
http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout.
+void Gzip::SetFilename(const std::string& filename, bool throwOnEncodingError)
+{
+ if(throwOnEncodingError)
+ {
+ for(size_t i = 0; i < filename.length(); i++) {
+ const char c = filename[i];
+ if( !(c >= 32 && c <= 126) && !(c >= 160 && c <= 255))
+ throw InvalidDataFormat("The filename is not ISO 8559-1
encoded");
+ }
+ }
+
+ m_filename = filename;
+}
And:
+const std::string& Gunzip::GetFilename(bool throwOnEncodingError) const
+{
+ if(throwOnEncodingError)
+ {
+ for(size_t i = 0; i < m_filename.length(); i++) {
+ const char c = m_filename[i];
+ if( !(c >= 32 && c <= 126) && !(c >= 160 && c <= 255))
+ throw InvalidDataFormat("The filename is not ISO 8559-1
encoded");
+ }
+ }
+
+ return m_filename;
+}
If I get far enough to call Gunzip::GetFilename, then the archive is good (and
decompressed) but the original filename could be bad.
The obvious strategy is (1) attempt to use the original filename, and (2)
fallback to something else on failure.
Original comment by noloa...@gmail.com
on 5 Jan 2015 at 7:03
Original issue reported on code.google.com by
noloa...@gmail.com
on 3 Jan 2015 at 11:03Attachments: