Closed userrails closed 1 year ago
Using PDFs stored in S3 is orthogonal to this library. HexaPDF has the general mechanism that you can provide an IO object to read a PDF from. So what you need to do is, get an IO object for the S3 URL and pass it to HexaPDF via HexaPDF::Document.new(io: s3url_io)
.
It doesn't matter to HexaPDF whether the PDF is stored as file, in a database, in S3, ... or just in a StringIO
. All it cares about is that it gets an IO object.
I don't know what an AWS::S3::Object
is (I have never used S3). You either have to get an IO object or a binary string with the data and wrap that then in a StringIO
.
It's resolved now. Thank you.
@userrails Great - could you write a comment with how you solved so that others may find the solution? Thank you!
@gettalong Hope this feedback will be helpful to everyone.
binary_string = Net::HTTP.get(URI.parse("https://example.s3.amazonaws.com/uploads/document.pdf?155463860"))
file = StringIO.new(binary_string)
doc = HexaPDF::Document.new(io: file)
doc.write('document.pdf', optimize: true)
@gettalong
I'm able to combine multiple PDFs using following code.
@target = HexaPDF::Document.new
bills.each do |inv|
path = inv.file.url
t = Tempfile.new([inv.id, '.pdf'])
t.write(open(path).read.force_encoding(Encoding::UTF_8))
t.close
pdf = HexaPDF::Document.open(t.path)
pdf.pages.each { |page| @target.pages << @target.import(page) }
end
@target.write('combined.pdf', optimize: true)
Without saving PDF on temp location as shown in above code, is it possible to use HexaPDF::Document.open()
method to accept StringIO object or may be binary string data?
I have multiple PDFs on object bills
which I want to merge into one PDF. Just curious to know if there are other ways in HexaPDF to merge multiple PDFs.
@gettalong Hope this feedback will be helpful to everyone.
Thanks, yes!
The arbitrary string.
binary_string = Net::HTTP.get(URI.parse("https://example.s3.amazonaws.com/uploads/document.pdf?155463860"))
Ah, so you are just using the URI without going through the S3 client gem. I think there should be a way to do this with the client gem since it has to be possible to get the contents of some blob stored in there.
@gettalong
I'm able to combine multiple PDFs using following code.
@target = HexaPDF::Document.new bills.each do |inv| path = inv.file.url t = Tempfile.new([inv.id, '.pdf']) t.write(open(path).read.force_encoding(Encoding::UTF_8)) t.close pdf = HexaPDF::Document.open(t.path) pdf.pages.each { |page| @target.pages << @target.import(page) } end @target.write('combined.pdf', optimize: true)
Without saving PDF on temp location as shown in above code, is it possible to use
HexaPDF::Document.open()
method to accept StringIO object or may be binary string data?
The HexaPDF::Document.open
method only accepts a file name. If you already have an IO object, just use HexaPDF::Document.new(io: io_object)
.
What you should be able to do is the following:
@target = HexaPDF::Document.new
bills.each do |inv|
open(inv.file_url) do |io|
pdf = HexaPDF::Document.new(io: io)
pdf.pages.each {|page| @target.pages << @target.import(page) }
end
end
@target.write('combined.pdf', optimize: true)
So if you can use open
to read inv.file_url
, you should be able to pass the created IO object directly to HexaPDF.
@gettalong
In combined pdf gem there was mechanism to read pdf from object without downloading it. This method is used to_pdf(options = {})
https://github.com/boazsegev/combine_pdf/blob/master/lib/combine_pdf/pdf_public.rb. Do we have any such feature ?
I want to allow user to download pdf directly from browser.
I'm not really sure what you mean. My guess is you mean rendering to a string instead of a file? If so, then yes, this is possible, just supply a StringIO
object when using HexaPDF::Document#write
.
As for showing it in the browser without downloading: This has nothing to do with HexaPDF, you need to do this in the web framework.
In my previous comment, I meant, I wanted to allow user to download pdf directly from browser.
@gettalong But write()
is writing to some IO file or string which is fine and working perfect for me. But this way i need to write PDF into some IO file or string and then again I have read it using other tools. This is working for me.
@target = HexaPDF::Document.new
bills.each do |inv|
open(inv.file_url) do |io|
pdf = HexaPDF::Document.new(io: io)
pdf.pages.each {|page| @target.pages << @target.import(page) }
end
end
I want to know if @target
object has all PDFs combined in some form, then without writing it to some IO file or IO string. Is it possible to read this object and convert to binary string which has PDF content?
When i inspect i can see following object.
> @target
=> <HexaPDF::Document:2256675220>
@target
is a HexaPDF::Document which holds the internal representation of a PDF file. You are adding pages to it from other PDF files, so yes, it contains those pages.
To get the on-disk representation you need to invoke @target.write
. If you want to have a binary string with the contents you need to do io = StringIO.new(''.b); @target.write(io); result = io.string
.
Okay, document write() process is mandatory. Thanks for your feedback.
Hi @gettalong,
I want to use PDF merge feature to merge PDFs by reading from the S3 URL. Do we have mechanism for that?
I'm expecting something like this:
I've read this issue: https://github.com/gettalong/hexapdf/issues/136. I wanted to confirm if we have any such feature recently developed.
Thank you.