Sude- / lgogdownloader

LGOGDownloader is unofficial downloader to GOG.com for Linux users. It uses the same API as the official GOG Galaxy.
https://sites.google.com/site/gogdownloader/
Do What The F*ck You Want To Public License
681 stars 66 forks source link

Check-Orphans default regex suggestion: mp4 files #257

Open danhallock opened 5 months ago

danhallock commented 5 months ago

The Witcher Goodies collection has four version of the same video, so I have the following in my blacklist:

Rp the_witcher_goodies_collection/extras/dvd.mp4
Rp the_witcher_goodies_collection/extras/1080p.mp4
Rp the_witcher_goodies_collection/extras/720p.mp4

By default --check-orphans won't match these. I've set my orphans regex to '.*\.(zip|exe|bin|dmg|old|deb|tar\.gz|pkg|sh|mp4)$' and that's working great, but this seems like a common enough title that it might make sense to include mp4 in the default regex.

Sude- commented 5 months ago

f483d2e adds mp4 to --check-orphans default regex

danhallock commented 5 months ago

FWIW I checked against my library (820 titles), and the only other extensions that exist in my data that are not covered by this regex are:

I don't know how much you're trying to cover every real-world case, but '.*\.([0-9][0-9][0-9]|bin|deb|dmg|exe|mp4|old|pdf|pkg|png|rar|sh|tar\.gz|zip|ZIP)$' will do it for my library at least. I put them in alphabetical order as I figured that might make it easier to handle future changes.

(There might be a more elegant way to do [0-9][0-9][0-9], I am not a regex-expert.)

ssokolow commented 5 months ago

(There might be a more elegant way to do [0-9][0-9][0-9], I am not a regex-expert.)

[0-9]{3} should do it if I remember my anything-Perl-or-newer syntax correctly.

danhallock commented 5 months ago

That does work (and is more elegant!). Thanks!

There is something happening when I add the numeric extensions to the regex that I don't quite understand: they seem to be matching twice.

Test conditions:

Then I ran --check-orphans. It matched everything I expected, but the numeric extensions are printed twice: with .*\.([0-9]{3}|bin|deb|dmg|exe|mp4|old|pdf|pkg|png|rar|sh|tar\.gz|zip|ZIP)$ Checking for orphaned files 839 / 839 ./age_of_wonders_3/dlc/age_of_wonders_3_deluxe_edition_upgrade/extras/aow3_ost_deluxe_mp3.zip ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.001 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.011 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.001 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.011 ./the_witcher_goodies_collection/extras/1080p.mp4 ./ultima_2/extras/Ultima_123_manuals.zip

A very simplified test case still does the same thing: with .*\.(009)$ Checking for orphaned files 839 / 839 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.009

I'll try to figure this out.

danhallock commented 5 months ago

Ah, the above behavior isn't about the extension, it's about those orphans being in a platform folder and not the extras folder.

Created a few more orphans (by copying existing files and adding .x.tmp into the filenames).

with .*\.([0-9]{3}|bin|deb|dmg|exe|mp4|old|pdf|pkg|png|rar|sh|tar\.gz|zip|ZIP)$

> lgogdownloader --use-cache --subdir-game %gamename%/%platform% --directory . --platform all --language en --check-orphans $ORPHAN_REGEX

Checking for orphaned files 839 / 839 ./age_of_wonders_3/dlc/age_of_wonders_3_deluxe_edition_upgrade/extras/aow3_ost_deluxe_mp3.zip ./hand_of_fate/mac/hand_of_fate_enUS_1_3_20_25350.x.tmp.pkg ./hand_of_fate/mac/hand_of_fate_enUS_1_3_20_25350.x.tmp.pkg ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.001 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.011 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.001 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.009 ./the_bards_tale_iv_directors_cut/mac/The_Bards_Tale_IV_Directors_Cut_Update_3.zip.011 ./the_witcher_goodies_collection/extras/1080p.mp4 ./trine_enchanted_edition/mac/patches/trine_1.0.0.1.x.tmp.dmg ./trine_enchanted_edition/windows/setup_trine_enhanced_edition_2.12(a)_(50506).x.tmp.exe ./trine_enchanted_edition/linux/gog_trine_enchanted_edition_2.0.0.2.x.tmp.sh ./trine_enchanted_edition/windows/setup_trine_enhanced_edition_2.12(a)_(50506).x.tmp.exe ./trine_enchanted_edition/mac/patches/trine_1.0.0.1.x.tmp.dmg ./trine_enchanted_edition/linux/gog_trine_enchanted_edition_2.0.0.2.x.tmp.sh ./ultima_2/extras/Ultima_123_manuals.009 ./ultima_2/extras/Ultima_123_manuals.zip

Platform folders: orphans printed twice. Extras folders: orphans printed once. Let me know if you'd like me to open an issue for that, I'm not sure if it's expected behavior. But the regex seems fine.