Apache Tika (http://tika.apache.org/) is the leading open source text
extraction framework, written in Java. It allows extracting text from a lot of
formats including PDF, DOC, ODF and 30 more.
Tika is modular, and it only takes one Java class along with one property file
to write a parser wrapper for Tika. I think java-axp could easily be exposed as
a Tika parser plugin with just a few hours work, and will enable all Tika users
to parse the XPS format.
See this example for how this is done to wrap the MS TNEF format:
http://github.com/jukka/jtnef/blob/master/src/net/freeutils/tnef/tika/TNEFParser
.java
Original issue reported on code.google.com by cominv...@gmail.com on 4 Oct 2010 at 9:58
Original issue reported on code.google.com by
cominv...@gmail.com
on 4 Oct 2010 at 9:58