documents4j / documents4j.github.io

Website to the documents4j project.
https://documents4j.com
1 stars 1 forks source link

Example to convert from PDF to DOCX with doc4j API #1

Closed rpinquie closed 8 years ago

rpinquie commented 8 years ago

Hi,

I cannot find any example that shows how to convert a PDF whose native format is DOCX into the original DOCX? I guess I should use the MicrosoftWordBridge converter, but can't figure out how...

Cheers,

raphw commented 8 years ago

You would use the IConverter API for this such as for any conversion. The bridge implementations are meant to be used by the converter API under the covers, i.e. make sure that the MS Word conversion bridge is added to the class path, the LocalConverter then automatically discovers it:

File wordFile = new File( ... ), target = new File( ... );
IConverter converter = LocalConveter.make();
Future<Boolean> conversion = converter
  .convert(pdfFile).as(DocumentType.PDF)
  .to(wordFile).as(DocumentType.DOCX)
  .schedule();
suri-pkl commented 6 years ago

Hi, I followed the above code for converting from pdf to .docx on my local machine, but the format is not exactly same, I could see some distortion in the converted file, meaning the header alignments are not same. Could you pls help on how can I get the exact formar with out any distortion My code: public class LocalConversion { static String sourcsFile="C:/PDF2DOC/FirmOrderLetter_MYR.pdf"; static String targetFile="C:/PDF2DOC/FirmOrderLetter_MYR.docx";

public static void main (String args[]) {
File pdfFile = new File(sourcsFile), target = new File(targetFile); IConverter converter = LocalConverter.make(); Future conversion = converter .convert(pdfFile).as(DocumentType.PDF) .to(target).as(DocumentType.DOCX) .schedule();

raphw commented 6 years ago

Hei, what do you mean with distorted? The conversion is applied from the application running in the background. Don't you get the same results when importing the PDF directly in Word? From PDF conversions are a bit tricky to begin with,

suri-pkl commented 6 years ago

Thanks for your response, the alignment of header is not same as in the pdf document after converting into word version. have attached the files FYR, in the converted version (docx) the left header was not aligned to right header. FYI, right header is an image. pdf_version docx_version

suri-pkl commented 6 years ago

shaded out the wordings for compliance issues.

raphw commented 6 years ago

Have you tried converting the document using MS Word manually? If the same behavior dispays, there is not much that documents4j can do differently.

suri-pkl commented 6 years ago

yes converting manually also same problem, alignment is not in order, How to achieve this the other way

raphw commented 6 years ago

This is a question for the MS Word team or a related group that I cannot answer.

suri-pkl commented 6 years ago

I mean any other way of converting pdf to docx with exact alignments/format using documents4j ?

raphw commented 6 years ago

documents4j is using MS Word underneath. It cannot do anything that Word is incapable of.

suri-pkl commented 6 years ago

Thank you, That means documents4j conversion works fine for docx to pdf conversion but not the otherway (pdf to docx). Thank you so much for your prompt responses. much appreciated.

ArunSutar1234 commented 6 years ago

what are all jars needs to be added from documents4j

raphw commented 6 years ago

Maven can list them via dependency:list.

rgaguedo commented 6 years ago

Hello, after converting the documents, I can not delete the files that have been generated because they are being used by a process. How could you close those procedures in order to eliminate them? Thank you.

raphw commented 6 years ago

The locks should be released after conversion. Did you check what process is holding the locks?

Salmanghouri commented 3 years ago

https://medium.com/smidyo-codex/how-to-create-an-api-in-node-js-to-convert-vector-files-svg-pdf-dxf-eps-and-more-243452445ccf

rafaeljigau commented 2 years ago

Hello @raphw. I'm trying to use the api to transform PDF to DOCX. image

This is the implementation, pretty simple I think... also using word its transforming the file as expected. I will attach a picture with the output of the conversion: image

raphw commented 2 years ago

PDF to DOCX is a strange process. Does the same happen if you open the PDF in Word direclty? documents4j just delegates the job, so this might just be the outcome.

rafaeljigau commented 2 years ago

Yes, works perfectly. Do you know any other libraries that could get the job done if documents4j seems not to work on this one?

raphw commented 2 years ago

You can have a look at the conversion vbs file that you find in the word-bridge jar file. Maybe you need to adjust this file?

documents4j offers a system property to use your own VBS file instead of the one that ships with documents4j, so maybe this can solve your problem?

gally76 commented 2 years ago

@rafaeljigau Hello, I have same issue. Do you have solved this problem or found an another library ?

chandra-rama commented 1 year ago

Hi, I followed the example given and I'm getting the following error when converting PDF to DOCX. Is there any dependency missing?

com.documents4j.throwables.ConversionInputException: No converter for conversion of application/pdf to application/vnd.openxmlformats-officedocument.wordprocessingml.document available at com.documents4j.conversion.ConverterRegistry.lookup(ConverterRegistry.java:65) at com.documents4j.conversion.DefaultConversionManager.startConversion(DefaultConversionManager.java:30) at com.documents4j.job.LocalFutureWrappingPriorityFuture.startConversion(LocalFutureWrappingPriorityFuture.java:50) at com.documents4j.job.LocalFutureWrappingPriorityFuture.startConversion(LocalFutureWrappingPriorityFuture.java:11) at com.documents4j.job.AbstractFutureWrappingPriorityFuture.run(AbstractFutureWrappingPriorityFuture.java:70) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

raphw commented 1 year ago

No, I do not think Word is capable of transforming PDF to Word format, just the other way around.

wlevene commented 1 month ago

pdf2document.com - No loss of pdf layout 30 free page conversions daily for all users.