at15 / bform

Paper and data management web application for plastic forming
Apache License 2.0
0 stars 0 forks source link

Java PDF Libraries #18

Open at15 opened 8 years ago

at15 commented 8 years ago

no php libraries ....

see https://github.com/jaeksoft/opensearchserver/blob/master/pom.xml to find more. Since we need fine grained control over pdf files

at15 commented 8 years ago
 <dependency>
            <groupId>org.icepdf</groupId>
            <artifactId>icepdf-core</artifactId>
            <version>5.0.7</version>
        </dependency>

貌似可以用来网页看 pdf , 但是不知道哪里用到了,要是可以提取图就好了 /w\

 <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox-ant</artifactId>
            <version>1.8.12</version>
        </dependency>

貌似只能提取字

at15 commented 8 years ago

http://stackoverflow.com/questions/6118635/what-is-the-best-pdf-open-source-library-for-java

at15 commented 8 years ago

https://github.com/modesty/pdf2json do support parse pdf files, but it does not support links

at15 commented 8 years ago

also we met ruby .... en ....

at15 commented 8 years ago
at15 commented 8 years ago

also a free parse tool ....

at15 commented 8 years ago
at15 commented 8 years ago

btw: doi is can be used to locate papers

at15 commented 8 years ago

well PHP also have library ... https://github.com/smalot/pdfparser though only text is supported

at15 commented 8 years ago

不过,最靠谱的还是这个 https://github.com/coolwanglu/pdf2htmlEX 转成 html 之后 .... 来获取信息 .....

need to use a docker mirror if I want to use this library ...

at15 commented 8 years ago

https://github.com/paquettg/php-html-parser 也可以用php来parse dom....嗯

at15 commented 8 years ago
CLASSPATH=/home/at15/Downloads/PDFClown/java/pdfclown.lib/build/package
java -jar pdfclown-sample-cli.jar

the sample is up and running when the jar is in classpath. But don't know if will work well for parsing papers.