Previously the output filename passed to the tesseract command was not shell-escaped. This meant that the filename was truncated and did not match the filename expected by Docsplit::TextExtractor#clean_text resulting in the following exception:
Errno::ENOENT: No such file or directory @ rb_sysopen - test/output/PDF file with spaces 'single' and "double quotes".txt
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `initialize'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `open'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:90:in `clean_text'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:80:in `extract_from_ocr'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:36:in `block in extract'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:32:in `each'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit/text_extractor.rb:32:in `extract'
/Users/jamesmead/Code/freerange/docsplit/lib/docsplit.rb:52:in `extract_text'
test/unit/test_extract_text.rb:58:in `test_name_escaping_while_extracting_text_using_ocr'
Previously the output filename passed to the
tesseract
command was not shell-escaped. This meant that the filename was truncated and did not match the filename expected byDocsplit::TextExtractor#clean_text
resulting in the following exception: