CommunityDragon / CDTB

A library containing everything to extract files from client files.
GNU Lesser General Public License v3.0
119 stars 33 forks source link

Extract pattern matching functionality is bad #104

Open Morilli opened 2 weeks ago

Morilli commented 2 weeks ago

The description for the --pattern option in wad extract reads: extract only files matching pattern with shell-like wildcards. One problem is that while it's possible to chain multiple commands, like -p "*.bin" -p "*.dds", this is not clear from reading the command description alone. Furthermore, it is not possible to filter only extensionless files. Perhaps the extracted paths should also be sanitized before being passed to the pattern matching function to allow filtering by "guessed" extensions, like .cdtb.bin.

Maybe just making the pattern a regex would also solve all of this, as you can match basically everything with regex alone.

benoitryder commented 2 weeks ago

Furthermore, it is not possible to filter only extensionless files.

Shell-like patterns are fairly limited. But more advanced regexp are less known and less convenient for simple cases.

We could add an option to extract only files retrieved from a file (or stdin). The user would be able to use any kind of regexp or script. And one-liners would still be possible; something like that:

cdtb wad-list some.wad | grep -v '\.[a-z0-9]\+$' | cdtb wad-extract --from-list - some.wad
Morilli commented 1 day ago

What about this?

diff --git a/cdtb/__main__.py b/cdtb/__main__.py
index 2981da8..4629922 100644
--- a/cdtb/__main__.py
+++ b/cdtb/__main__.py
@@ -133,10 +133,11 @@ def command_wad_extract(parser, args):
     elif args.unknown == 'no':
         wad.files = [wf for wf in wad.files if wf.path is not None]

+    wad.guess_extensions()
+    wad.sanitize_paths()
     if args.pattern:
         wad.files = [wf for wf in wad.files if any(wf.path is not None and fnmatch.fnmatchcase(wf.path, p) for p in args.pattern)]

-    wad.guess_extensions()
     wad.extract(args.output, overwrite=not args.lazy)

This would transform file paths before applying the pattern, allowing patterns like *.cdtb.bin.