dsuni / svndumpsanitizer

A program aspiring to be a more advanced version of svndumpfilter
https://miria.homelinuxserver.org/svndumpsanitizer/
GNU General Public License v3.0
47 stars 15 forks source link

Add Wildcard search functionality. #2

Closed electrawn closed 10 years ago

electrawn commented 11 years ago

Add wildcard search functionality on includes. Could probably be expanded to work on excludes. Also, changed rewind function as that was breaking on windows 7 64bit for some reason (used for test). Works fine on Linux.

dsuni commented 11 years ago

This is probably not a bad feature to have for some people. Thanks for taking the time to implement it.

Since it makes several changes to the analyzing logic, I will have to study it in detail before incorporating it, though. This logic is the part that has been the source of the most sinister svndumpsanitizer bugs in the past, and it's really easy to make a mistake that will only show up with some repositories, under some circumstances.

electrawn commented 11 years ago

I used it on a repository that was built like :

/branches/release/Clientx-Timestamp /branches/release/Clienty-Timestamp /branches/archive/Clientx-Timestamp /branches/archive/Clienty-Timestamp

/trunk/Clientx /trunk/Clienty

I was able to do --include /branches/release/Clientx* /branches/archive/Clientx* /trunk/Clientx. This filtered 9 "Clients" and load them into separate repositories successfully. I did run into errors, look down commit history to see thought processes involved.

Theoretically, the snippet should be able to do ? wildcarding as well.

-Jason Potkanski


From: dsuni notifications@github.com To: dsuni/svndumpsanitizer svndumpsanitizer@noreply.github.com Cc: electrawn electrawn@yahoo.com Sent: Tuesday, July 30, 2013 7:52 AM Subject: Re: [svndumpsanitizer] Add Wildcard search functionality. (#2)

This is probably not a bad feature to have for some people. Thanks for taking the time to implement it. Since it makes several changes to the analyzing logic, I will have to study it in detail before incorporating it, though. This logic is the part that has been the source of the most sinister svndumpsanitizer bugs in the past, and it's really easy to make a mistake that will only show up with some repositories, under some circumstances. — Reply to this email directly or view it on GitHub.

dsuni commented 11 years ago

Ok. I've looked at the code and ran a few tests. Given how complicated the analyzing logic is, and given how many mistakes and oversights I've made myself writing it, I would have been somewhat depressed if someone else had picked up the code and made a flawless implementation of a new feature that changed this logic.

Fortunately I found a bug. ;-) The problem shows when building the deleting revision with some repositories. E.g. this repository: http://miria.linuxmaniac.net/svndumpsanitizer/foobar.dump when filtered using the command... svndumpsanitizer -i foobar.dump -o somefile.dump -n trunk/dowant ...should look exactly like this file... http://miria.linuxmaniac.net/svndumpsanitizer/foobar_sanitized.dump ...the last timestamp being the only exception.

When filtered with the new version using the command svndumpsanitizer -i foobar.dump -o somefile.dump -n trunk/dow* ...the following lines are also added to the final revision

Node-path: trunk/dowant Node-action: delete

This is of course a bug, since we don't want to delete that directory.

This can be fairly easily fixed. Add a wildcmp(include[j], temp_str) check to line 648, and the above command will work beautifully. What still works less beautifully is if you go extreme, and do something like this: svndumpsanitizer -i foobar.dump -o somefile.dump -n unk/dow

Now the last revision will "kindly" delete the entire trunk for us. The key line here is 668, where we check that the no_longer_relevant-variable doesn't contain anything specifically included. Making this check work correctly with wildcards, however turned out to be a non-trivial exercise. :-(

I'll have to come back to this when I have some more time to spare. Just posted this so that you know I haven't just been twiddling my thumbs.

dsuni commented 11 years ago

Ok... I've spent some time on analyzing this issue, and as I feared it turns out that the analyzing logic goes bonkers with the wildcards. Relatively simple cases, like the one you had, where the user is satisfied with adding an asterisk to the end of his includes are OK, but even slightly more complicated stuff makes the analysis fail. :-(

There doesn't seem to be an easy fix either - one would have to rewrite large portions of the core logic in order to implement this feature in a clean manner. (Clean in this case meaning something that works with all valid repositories and all valid input.)

I'll keep this feature in mind, but it's so big that I think I'll have to leave it for svndumpsanitizer-2.0, if I ever get that far. :-P