Closed alxlo closed 9 years ago
Hi @alxlo, out of interest, what is the use-case for XHTML output?
Hi Matthew, the XHTML contains a separate div-section for each page of a PDF file. I am currently experimenting with lunr.js to generate a search index on the server side to be used by a (mobile) client application and hope to deliver more precise search hits by indicating not only the PDF, but also the the page in the PDF for search hits. Best regards, Alexander
Try the tika.xhtml
method available on the master branch. Be warned that there are breaking changes with the way options are specified.
Thank you so much, works like a charme!
Fantastic :smile_cat:
It would be awesome, if the bridge could not only deliver plain text, but as well the XHTML that can be generated by the Tika default configuration :-)