Closed alard closed 13 years ago
The maximum size should be configurable as well.
I just pushed a change that adds an option for the maximum size, defaulting to 1GB. I see you've already implemented the failover to a new file, but it has a bug: the record that would have pushed it over the maximum doesn't get written to the new file.
actually, it's just the records that are larger than the allowed size of the warc file that don't get written. Testing with a 1k warc file hits that fairly often :)
I changed the way the file size is handled. The file size is now checked after the record has been written. This means that files can become (somewhat) larger than the limit, but I think it's better this way (and Heritrix does this too):
Ah, that looks good. I think that's about it for this one.
If the WARC file reaches its maximum size (which is set to 1GB at the moment), WFile_storeRecord returns an error. wget-warc should open a new WARC file if this happens.