aws-samples / amazon-textract-textractor

Analyze documents with Amazon Textract and generate output in multiple formats.
Apache License 2.0
407 stars 145 forks source link

pdf2image is required even though save_image=False #366

Closed vdefeo-caylent closed 4 months ago

vdefeo-caylent commented 6 months ago

Hi all, found another interesting scenario.

  document_analysis_response = extractor.analyze_document(
      file_source=f"s3://{PDF_INPUT_BUCKET_NAME}/{file_path}",
      features=[TextractFeatures.LAYOUT, TextractFeatures.TABLES],
      save_image=False
  )

Error:

extractor.exceptions.MissingDependencyException: 
pdf2image is not installed. 
If you do not plan on using visualizations you can skip image generation 
using save_image=False in your function call.

Am I missing something here?

The same parameters work perfectly when I use the method start_document_analysis instead of analyze_document

Belval commented 4 months ago

This is fixed in 1.8.0.