Closed eRuaro closed 1 year ago
@eRuaro - As of 0.7.0
, detectron2
is installed using the ONNX runtime to eliminate the need to install detectron2
from source. If you're using a version more recent than than you shouldn't need the detectron2
installation step in your Dockerfle any longer. cc @qued @benjats07
Check out this comment on the other issue you posted, I think that's likely the root of the pdfminer
behavior you're seeing.
Describe the bug A clear and concise description of what the bug is.
To Reproduce I was able to use it 2 months ago for parsing scanned PDFs, but when I rebuilt my docker container, it keeps using
pdfminer
instead. Now whenever I try to parse a scanned PDF, it returns an empty array when runningloader.load_and_split()
.Here's my dockerfile:
Here's the code segment that uses
Unstructured
:Expected behavior
Unstructured
will use detectron and notpdfminer
Screenshots If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context I'm using langchain which uses unstructured under the hood: https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/pdf.html#using-unstructured