Closed davidcallen closed 6 years ago
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On September 4, 2018 6:44 PM, davidcallen notifications@github.com wrote:
- if --pattern is definitely not supported ?
No. There are no hidden features. What you get with --help is all there is.
- could --pattern be added ?
It's not the first time I've gotten this feature request, and I actually got started on it, but backed out when I realized the task would be far more daunting than I had first anticipated.
- would it be relatively easy to add --pattern (even if with only limited matching capability), because if so I may have the need and available time to have a go at coding it ?
Not really, unfortunately. What I personally would do if I had to deal with a repo like this, would be something like:
$ ./svndumpsanitizer --infile dumpfile --outfile new_dumpfile --exclude cat dumpfile | grep ^Node-path: | grep <pattern> | tr '\n' ' '
Thanks for the great tool.
Glad to hear that it's helpful. :-)
-Daniel
Not really, unfortunately. What I personally would do if I had to deal with a repo like this, would be something like: $ ./svndumpsanitizer --infile dumpfile --outfile new_dumpfile --exclude
cat dumpfile | grep ^Node-path: | grep <pattern> | tr '\n' ' '
Oops. I was actually a bit hasty with that command. It would contain a lot of redundancy, and the grep output might be too long for the shell to handle. A better way would be to use sort and sed to get rid of that redundancy. Basically I would fiddle with the following commands until I got a list I thought looked reasonable, and then use that for svndumpsanitizer:
$ cat dumpfile | grep -a ^Node-path: | sed 's/^Node-path: //' | grep pattern | sort -u
This should provide a non-redundant list of every file and directory in the repo containing pattern. To further reduce the (from svndumpsanitizer's perspective) redundant stuff (You only need to exclude /trunk/pattern - not trunk/pattern/foo trunk/pattern/bar trunk/pattern/baz separately.) you could play around with sed a bit more, e.g.:
$ cat dumpfile | grep -a ^Node-path: | sed 's/^Node-path: //' | grep pattern | sort -u | sed 's/pattern.*/pattern/' | sort -u
Without knowing the exact pattern and how it shows up in the repository, but I'm hard pressed to think of a situation where it couldn't be handled with a grep/sed/sort combo, and fiddling around with those will most certainly be a lot easier than adding this feature to svndumpsanitizer.
Thanks very much Daniel for the great info. I hadnt thought of building a large exclude list from grepping and sort-unique ...great idea. It might be a long list (55GB dumpfile) but hopefully will do the job.
I'll close this issue then - many thanks for the tool and for your fast and helpful response.
I needed to use the --pattern feature that is in svndumpfilter to omit any file (or directory) that is named relating to a customer, that no longer wants their client-specific code stored in our repository for boring legal reasons. I then found the issue(s) that led me to svndumpsanitizer. It looks great and very professional, however when looking in the source code and a bit of experimentation and I dont think any "exclude --pattern " behaviour is supported. Would be great to know 1) if --pattern is definitely not supported ? 2) could --pattern be added ? 3) would it be relatively easy to add --pattern (even if with only limited matching capability), because if so I may have the need and available time to have a go at coding it ?
Thanks for the great tool.