documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.com/docsplit/
Other
833 stars 214 forks source link

Docsplit.extract_text auto orientation detection 'detect_orientation: true' param does not work. #143

Open michaeltranlong opened 7 years ago

michaeltranlong commented 7 years ago

When I execute 'tesseract --list-langs' I do see 'osd' on my mac.

I see Docsplit::DEPENDENCIES[:tesseract] is true and Docsplit::DEPENDENCIES[:osd] is false

Tracked it down to these 33-37 lines in 'lib/docsplit.rb'

if DEPENDENCIES[:tesseract]

osd will be listed in tesseract --listlangs

val = %x[ #{'tesseract --list-langs'} 2>&1 >/dev/null ]
DEPENDENCIES[:osd] = true if val =~ /\bosd\b/

end

Not sure why the '> /dev/null' is there but that would make 'val' assignment be empty and cause DEPENDENCIES[:osd] = false