dsuni / svndumpsanitizer

A program aspiring to be a more advanced version of svndumpfilter
https://miria.homelinuxserver.org/svndumpsanitizer/
GNU General Public License v3.0
47 stars 15 forks source link

Could svndumpsanitizer support the --pattern feature that is in svndumpfilter #17

Closed davidcallen closed 6 years ago

davidcallen commented 6 years ago

I needed to use the --pattern feature that is in svndumpfilter to omit any file (or directory) that is named relating to a customer, that no longer wants their client-specific code stored in our repository for boring legal reasons. I then found the issue(s) that led me to svndumpsanitizer. It looks great and very professional, however when looking in the source code and a bit of experimentation and I dont think any "exclude --pattern " behaviour is supported. Would be great to know 1) if --pattern is definitely not supported ? 2) could --pattern be added ? 3) would it be relatively easy to add --pattern (even if with only limited matching capability), because if so I may have the need and available time to have a go at coding it ?

Thanks for the great tool.

dsuni commented 6 years ago

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On September 4, 2018 6:44 PM, davidcallen notifications@github.com wrote:

  • if --pattern is definitely not supported ?

No. There are no hidden features. What you get with --help is all there is.

  • could --pattern be added ?

It's not the first time I've gotten this feature request, and I actually got started on it, but backed out when I realized the task would be far more daunting than I had first anticipated.

  • would it be relatively easy to add --pattern (even if with only limited matching capability), because if so I may have the need and available time to have a go at coding it ?

Not really, unfortunately. What I personally would do if I had to deal with a repo like this, would be something like:

$ ./svndumpsanitizer --infile dumpfile --outfile new_dumpfile --exclude cat dumpfile | grep ^Node-path: | grep <pattern> | tr '\n' ' '

Thanks for the great tool.

Glad to hear that it's helpful. :-)

-Daniel

dsuni commented 6 years ago

Not really, unfortunately. What I personally would do if I had to deal with a repo like this, would be something like: $ ./svndumpsanitizer --infile dumpfile --outfile new_dumpfile --exclude cat dumpfile | grep ^Node-path: | grep <pattern> | tr '\n' ' '

Oops. I was actually a bit hasty with that command. It would contain a lot of redundancy, and the grep output might be too long for the shell to handle. A better way would be to use sort and sed to get rid of that redundancy. Basically I would fiddle with the following commands until I got a list I thought looked reasonable, and then use that for svndumpsanitizer:

$ cat dumpfile | grep -a ^Node-path: | sed 's/^Node-path: //' | grep pattern | sort -u

This should provide a non-redundant list of every file and directory in the repo containing pattern. To further reduce the (from svndumpsanitizer's perspective) redundant stuff (You only need to exclude /trunk/pattern - not trunk/pattern/foo trunk/pattern/bar trunk/pattern/baz separately.) you could play around with sed a bit more, e.g.:

$ cat dumpfile | grep -a ^Node-path: | sed 's/^Node-path: //' | grep pattern | sort -u | sed 's/pattern.*/pattern/' | sort -u

Without knowing the exact pattern and how it shows up in the repository, but I'm hard pressed to think of a situation where it couldn't be handled with a grep/sed/sort combo, and fiddling around with those will most certainly be a lot easier than adding this feature to svndumpsanitizer.

davidcallen commented 6 years ago

Thanks very much Daniel for the great info. I hadnt thought of building a large exclude list from grepping and sort-unique ...great idea. It might be a long list (55GB dumpfile) but hopefully will do the job.

I'll close this issue then - many thanks for the tool and for your fast and helpful response.