Split logging between stderr and stdout

I've been using stdout for #34 and am in the camp of "stdout is for output, not UI" so that I can pipe things together. Using stdout for messages to the user would make that impossible.

If you want to split verbose logging from error messages, I propose to use a command line option, e.g. --log-file, that writes the verbose messages (a record was filtered due to url filter, that kind of stuff) to a separate file. This would also make it optional so it doesn't need any changes in bitextor. Edit: and if you really want the log messages to go to stdout, you can use --log-file=/dev/stdout or --log-file=-.

That being said, the only error message that doesn't terminate warc2text is when a warc archive contains broken gzip records (which could indicate file corruption). All others either are the last message to be printed before warc2text dies with a non-zero exit code which seems pretty reasonable to me.

A different annoyance I've had: if you're running multiple warc2text processes through parallel, warc2text will not prefix the logging messages with the name (and offset maybe?) of the warc that the message is about. Right now you need to recollect all messages from a single warc2text in order, and then go through it from top to bottom to figure out which warc is the source of any of the messages. Running warc2text with just a single warc archive, and letting parallel do the log grouping is also not an option since then you can't combine the output of multiple warcs easily and you end up with many more files on disk.

bitextor / warc2text

Split logging between stderr and stdout #37