dscharrer / innoextract

A tool to unpack installers created by Inno Setup
https://constexpr.org/innoextract/
Other
996 stars 124 forks source link

Add options to extract only some files/components/... #12

Open dscharrer opened 11 years ago

dscharrer commented 11 years ago

Currently we only have --language (which should be extended to accept a list!).

Users should be able to limit th files to be extracted should by

  1. Filenames(+path) (--include/--exclude chains like with e.g. rsync)
  2. Raw, unprocessed filenames as they appear in innoextract --list --dump output
  3. Components defined in the setup file
  4. Languages defined in the setup file
  5. Tasks specified in the setup file
  6. Files can have "checks" - scripts that determine if the file gets extracted. For now we completely ignore them.

Additionally:

This is all probably over-engineering the problem but I don't want whatever is implemented now to at least to allow adding all these points in the future without breaking the command-line interface.

ssokolow commented 11 years ago

At the very least, it'd be nice to be able to do --include app/* for excluding what's obviously not needed when using a GOG installer to feed an engine re-implementation like GemRB.

dscharrer commented 11 years ago

Depends on issue #25 for proper --include/--exclude support (keeping order between different arguments without hacks).

a-detiste commented 9 years ago

Instead of silencing all warnings in game-data-packager when one file of the same name exists in several directories with a slew of distinctive_name: false; I'll have a look on how to fix that here.

This would also give a ~5x speedup for bundles like Space Quest 1,2,3 that got now extracted 3 times.

http://anonscm.debian.org/cgit/pkg-games/game-data-packager.git/tree/data/spacequest1.yaml

@smcv

a-detiste commented 9 years ago

I have now my own ultra-fast innoextract :-)

https://github.com/a-detiste/innoextract/commit/8cd04fbcda7b167707e533e7b4b1a8f3786cdf60

Would you like a P.R. that only adds the bare minimum I need, as long as it "allow adding all these points in the future without breaking the command-line interface." .

That means accepting only one --include argument for now, no exclude argument, no globbing, but one can run innoextract several times if needed.

pi@raspberrypi ~/innoextract/build $ time ./innoextract -s -d /tmp/tmp/ /home/pi/setup_zork_anthology.exe

real    0m3.751s
user    0m3.570s
sys     0m0.170s
pi@raspberrypi ~/innoextract/build $ time ./innoextract --include Zork/DATA -d /tmp/tmp/ /home/pi/setup_zork_anthology.exe                                                                                                                            
Extracting "Zork Anthology" - setup data version 5.2.3
 - "app/Zork/DATA/ZORK1.DAT" (90 KiB)
Done.

real    0m0.206s
user    0m0.190s
sys     0m0.010s
dscharrer commented 9 years ago

Sure, a patch for just --include would be appreciated. However, here are a few minor nitpicks about your implementation:

a-detiste commented 9 years ago

Ok, here's a much better patch.

https://github.com/a-detiste/innoextract/commit/136ee527557d72746135ab22504f6c47fbfe2047?diff=unified

~but it won't work on _WIN32, first path_sep definition needs to be moved to setup/filename.hpp first~

a-detiste commented 9 years ago

This fix "match full path components" behaviour for "patterns starting with a slash".

work with WIN32 too, move one time checks out of loop

https://github.com/a-detiste/innoextract/commit/5ae45213385ce962b3d9361113e5559e50332828 inno1

dscharrer commented 9 years ago

A coupe of points:

  1. The second commit breaks inc_root: You compare against inc_string with offset 1, but then use the whole inc_string.size()
  2. A pattern with a trailing slash matches paths that don't end in a trailing slash. I'm not sure if that is desirable or not.
  3. Can you change the if/else style to match the rest of the code?
    • Both preceding and following braces should always be on the same line as else
    • Use braces even for single-line ifs and elses

Sorry for not getting back to you sooner.

a-detiste commented 9 years ago
  1. ok , thanks for reviewing this !
  2. I changed it so that supplied path ending with a slash never match anything, it may feel less user-friendly, but it's more predictable. (and "users" are mostly scripts anyway)
  3. I can smash the commit together and create a P.R. if you want. I'd rather not keep forever my messy history in the main three for no purpose.

From this experience, implenting support for several include & exclude doesn't seem so hard:

The include & exclude lists would be vectors of "typedef std::pair<bool, std::string> filter;" (bool = inc_root , string = inc_string) that get populated in main.cpp.

Then extract.cpp read those in two BOOST_FOREACH() that sets a filtered = true flag when there is a match. Then there is a if (filtered) continue;

They wouldn't work the rsync way like stated up:

For example:

dscharrer commented 9 years ago

I can smash the commit together and create a P.R. if you want.

Yes, that would be great.

From this experience, implenting support for several include & exclude doesn't seem so hard:

The hard part would be that boost::program_options makes it hard to keep the relative order between two different option types. Multiple --includes should be easy though. Adding the two options while ignoring their relative order would mean that respecting the order in the future could break scripts. And I do want them to respect the order, especially once there are wildcards.

ssokolow commented 9 years ago

Have you considered using "most specific match wins" semantics like in CSS to control the interaction between include and exclude rules?

(eg. if you include "/", exclude "/app", and include "/app/manual.pdf", then the most intuitive interpretation is unambiguous regardless of order.)

dscharrer commented 9 years ago

That requires a notion of "more specific" though. At lest with wildcards that won't be so intuitive anymore. And even CSS uses the order of definition when the specificity is the same.

More importantly though, "most specific match wins" can be easily implemented on top of "first match wins" by ordering the arguments by how specific they are. The other way around is not so easy, if at all possible.

ssokolow commented 9 years ago

Good point. I wrote that at the end of a jet-lagged day and I didn't think about what would happen when "most path components, then most characters" specificity calculation runs into two rules with the same number of path components and same length.

(eg. Specifying the same path both as an include and an exclude may be silly, but it shouldn't result in undefined behaviour)

bam80 commented 5 years ago

This is highly welcomed feature: https://framagit.org/vv221/play.it/issues/139#note_404696

vv221 commented 5 years ago

@bam80 There is already --include (-I) available in current stable version of innoextract, but I think it can only be called once. Being able to specify multiple paths and support for globbing would indeed be very nice improvements to have.