documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.io/docsplit/
Other
832 stars 214 forks source link

No such file or directory @ rb_sysopen - example.doc (Errno::ENOENT) #115

Open jhonc33 opened 10 years ago

jhonc33 commented 10 years ago

Hi, i'm trying to extract images from a Microsoft Office Word document and return this error:

/home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:22:in `initialize': No such file or directory @ rb_sysopen - example.doc (Errno::ENOENT)
from /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:22:in `open'
from /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:22:in `is_pdf?'
from /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:11:in `block in ensure_pdfs'
from /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:10:in `map'
from /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit/transparent_pdfs.rb:10:in `ensure_pdfs'
from /home/deploy/.rvm/gems/ruby-2.1.2/gems/docsplit-0.7.5/lib/docsplit.rb:50:in `extract_images'
from test.rb:4:in `<main>'

This is the script:

require "docsplit"
Docsplit.extract_images('example.doc', :size => '1000x', :format => [:png, :jpg])

I'm using Centos 6 with all libraries installed, on Mac Os X works great on the same way. If i try to convert a pdf works great, only fail with office documents.

Any ideas?

Thanks,

jhonc33 commented 10 years ago

This is the line 22 on transparent_pdfs:

File.extname(doc).downcase == '.pdf' || File.open(doc, 'rb', &:readline) =~ /\A\%PDF-\d+(\.\d+)?/

Any suggestion?

crusadergo commented 4 years ago

up

thanhtoan1196 commented 2 years ago

up