Docsplit.extract_text generates a String with a null byte

Hello,

First of all, thank you for the gem.

Second, I currently have a pdf that, when put through Docsplit.extract_text, it creates a file with a null byte character. Shouldn't this be handled by TextCleaner#clean? Or do you think that the issue is within pdftotext/tesseract?

Unfortunately, the pdf that I am using is from a client and I can't provide it. I also haven't been able to manually create one that causes this.

documentcloud / docsplit

Docsplit.extract_text generates a String with a null byte #152