DanEngelbrecht / golongtail

Command line front end for longtail synchronization tool
MIT License
26 stars 8 forks source link

Exclude filter regex option does nothing for get command #212

Closed AOOSG closed 2 years ago

AOOSG commented 2 years ago

I'd like to filter away PDB files when executing the get command to reduce the download size when getting the latest UE5 build. I assumed this would be possible using a --exclude-filter-regex ^.*\.[Pp][Dd][Bb]$ parameter?

The full command looks something like this:

longtail.exe get --source-path gs://bucket/engine-1-win64.json --target-path ue5 --exclude-filter-regex ^.*\.[Pp][Dd][Bb]$

But all PDB files are still downloaded. I've simplified the regex to just filter a specific file in the root folder and I've also tried various versions of --include-filter-regex. Neither of the two options seems to do anything.

Am I missing something or are these only meant to work for the put command (I've verified that --exclude-filter-regex ^.*\.[Pp][Dd][Bb]$ works for the put command)?

DanEngelbrecht commented 2 years ago

Yeah, I think it currently only applies when scanning source or target folder, not when reading an uploaded version. This needs some work to get done. One option is to upload two versions, one with pdb and one without, the deduplication should make the redundant data small in the store.

AOOSG commented 2 years ago

OK, so I should be able to upload two versions, one one of them that includes the PDBs, and as you say it avoids redundant data pretty well:

put command using: --target-path gs://bucket/engine-1-win64.json put command using: --target-path gs://bucket/engine-1-pdb-win64.json

Neat!

While messing around with this earlier I found it quite hard to write an exclusion filter for anything that isn't PDB (If I wanted to upload a PDB store for example) because the regex is tested against folders. Folders should never be tested against the regex otherwise we won't find PDB files in sub folders.

So I ended up up writing this include regex which kinda works, but has some edge cases for folder names with dots or files without dots: ^[^.]*$|.*\.[Pp][Dd][Bb]$ I.e.: include files/folders without a dot or files/folders with a .pdb extension.

Perhaps a flag to exclude folders in include/exclude regex searches would be useful? I don't think I need this PDB store anymore though.

DanEngelbrecht commented 2 years ago

I think using .*\.[Pp][Dd][Bb]$ as an exclude filter should capture everthing besides pdbs? Try it out in one of the online golang regex sites. ~Maybe need to tweak for case sensitivity but I think it is not case sensitive by default~. Not at computer atm so can’t check.