HazyResearch / deepdive

DeepDive
deepdive.stanford.edu
1.96k stars 539 forks source link

How do you process .pdf ? #572

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi, My name is Bruno Gallien and I am new with the ecosystem used by deepdive. I followed the spouse tutorial and I think I am ready to start coding on my own project. The project I am working on will receive .pdf documents and I was wondering how do you process your documents in .pdf . Because in the spouse example, the articles are in text. So I was wondering If I needed to do some kind of conversion and if so how am I able to keep the bounding box informations around the words in the original document.

Thanks in advance Best Regards Bruno Gallien

philipperemy commented 8 years ago

Use https://github.com/euske/pdfminer

ghost commented 8 years ago

Thanks a lot.