Closed sebastian-nagel closed 7 years ago
The problem was that output was compressed by default, but seqfileutils.py cannot convert compressed output into a sequence file resp. dosample.py fails while reading the output. See comments in run_index_hadoop.sh how to configure the job not to compress the output or how to decompress the text file and generate a sequence file from command-line in case the sample job fails with an encoding error.
When writing the sequence file (at the very end of the job) dosamply.py sometimes fails with an encoding error:
It's hard to reproduce, it takes the long, and with the same input the error may not appear, or come up at another codepoint: