dsuni / svndumpsanitizer

A program aspiring to be a more advanced version of svndumpfilter
https://miria.homelinuxserver.org/svndumpsanitizer/
GNU General Public License v3.0
47 stars 15 forks source link

Output file got zero bytes for large dump file #4

Closed thomas-tran closed 10 years ago

thomas-tran commented 10 years ago

HI dsuni,

I tried to use svndumpsanitizer for my full dump file and filter about 10 paths. The original dump file about 220 GB. The output file always generate with 0 bytes no matter how I rerun many times

dsuni commented 10 years ago

The dump size shouldn't be a problem. (The biggest dump I've heard of someone using svndumpsanitizer on was >700GB)

As for the rest, it unfortunately doesn't contain enough information to diagnose the problem :-( To get an outfile of 0 bytes sounds really odd, though... Even if you somehow excluded everything in the repository, at least revision 0 should be written. What happens if you don't specify an outfile, and let it write to stdout instead?

Knowing the actual full command you're using would also be useful. Having the dump file would be nice as well, but I understand that such a behemoth is quite unwieldy, and probably contains sensitive data...

dsuni commented 10 years ago

That looks ok... What happens if you omit the outfile parameter?

thomas-tran commented 10 years ago

I haven't tried that, what will happen if I omit output file. Will it overwrite the input?

On Sun, Oct 20, 2013 at 3:48 PM, dsuni notifications@github.com wrote:

That looks ok... What happens if you omit the outfile parameter?

— Reply to this email directly or view it on GitHubhttps://github.com/dsuni/svndumpsanitizer/issues/4#issuecomment-26664956 .

dsuni commented 10 years ago

No. It should just write everything to stdout.

thomas-tran commented 10 years ago

Thank you very much for such a great support. I tried to omit output file and write to stdout, here is the result

Revision-number: 88147 Prop-content-length: 133 Content-length: 133

K 7 svn:log V 22 Deleted unwanted nodes K 10 svn:author V 16 svndumpsanitizer K 8 svn:date V 27 2013-10-20T09:25:28.000000Z PROPS-END

Node-path: tags Node-action: delete

Node-path: DBScripts/To be archived Node-action: delete

On Sun, Oct 20, 2013 at 4:34 PM, dsuni notifications@github.com wrote:

No. It should just write everything to stdout.

— Reply to this email directly or view it on GitHubhttps://github.com/dsuni/svndumpsanitizer/issues/4#issuecomment-26665468 .

dsuni commented 10 years ago

Ok... So it does appear to do something. And given that you specified the --drop-empy parameter, it would seem that it has indeed kept almost 90000 revisions. Now what happens if you run the exact same command and redirect the output to a file? I.e. svndumpsanitizer --infile [...] --drop-empty > filtered.dump

thomas-tran commented 10 years ago

I tried with the output still the same result. However if I tried with another dump 120 GB then it is OK. Every dump > 160GB gave me 1kbytes file.

On Mon, Oct 21, 2013 at 12:07 AM, dsuni notifications@github.com wrote:

Ok... So it does appear to do something. And given that you specified the --drop-empy parameter, it would seem that it has indeed kept almost 90000 revisions. Now what happens if you run the exact same command and redirect the output to a file? I.e. svndumpsanitizer --infile [...] --drop-empty > filtered.dump

— Reply to this email directly or view it on GitHubhttps://github.com/dsuni/svndumpsanitizer/issues/4#issuecomment-26672674 .

dsuni commented 10 years ago

That's really odd... But looking at the output again, if it only outputs that revision and nothing else it's almost like the rewind-operation (line 747 in version 1.2.1) isn't properly performed.

I've never had any problems with that myself, no matter how big the file, but someone mentioned that it for some reason it didn't work with some versions of Windows despite it being supported according to the documentation. You could try replacing that line with this one (which should do the same thing), and recompile:

fseeko(infile, 0 , SEEK_SET);

thomas-tran commented 10 years ago

Actually I was use your old version 1.01 to compile in windows 7 and it did not ran successfully for over 150 GB. I tried to get latest from github, compiled using gcc on 32 bit version but got exception on incompatible when execute it.

Which is the best version I should get to compile in windows?

Thanks

On Mon, Oct 21, 2013 at 1:02 AM, dsuni notifications@github.com wrote:

That's really odd... But looking at the output again, if it _only_outputs that revision and nothing else it's almost like the rewind-operation (line 747 in version 1.2.1) isn't properly performed.

I've never had any problems with that myself, no matter how big the file, but someone mentioned that it for some reason it didn't work with some versions of Windows despite it being supported according to the documentation. You could try replacing that line with this one (which should do the same thing), and recompile:

fseeko(infile, 0 , SEEK_SET);

— Reply to this email directly or view it on GitHubhttps://github.com/dsuni/svndumpsanitizer/issues/4#issuecomment-26673541 .

dsuni commented 10 years ago

I don't know. :-P I kicked my last windows installation to the curb back in 2005, and haven't looked back since. The windows patch is an external contribution that I haven't tested myself.

The windows patch has never changed, though, so I think all versions should be equally easy/problematic to compile. For that reason you should use the latest version, which contains some bug fixes. (I think the guy submitting the patch was using visual studio to compile it...)

thomas-tran commented 10 years ago

After replace rewind with fseeko(infile, 0 , SEEK_SET) it worked perfectly for dump more than 300 GB.

Thank you very much