Closed d-v-dlee closed 2 years ago
Thanks David!
I think the issue with wrong ndims should have been happening only when the thumbnailer endpoint returns an image
variable instead of images
, so have pushed the fix in https://github.com/aws-samples/amazon-textract-transformer-pipeline/commit/de7ac69f820468af0305369b86355b898e60bafc rather than editing both sides of the if page_num is None
condition.
From a quick test seems like this should fix the pipeline (up until A2I review of course, which only supports PDFs for now) - but let me know if there's a case I missed!
trying to run this solution (branch
lmv2
on jpg inputs will cause an error. two files need to be updated. submitting an issue instead of PR since this is based onlmv2
vmain
branch.Required changes:
1.
preprocess/inference.py
update the
SINGLE_IMAGE_CONTENT_TYPES
dictionary on line 520 to include"image/jpg":'JPG"
2.
src/code/inference.py
update logic for thumbnails to fix the logger message of
"Thumbnails expected either array of PNG bytestrings or 4D images array. "
. after the logging message add the following code:the
not images
logic also needs to be updated on line 428 and 445.on line 428, the change is from
if processor and not images:
toif processor and images is None:
. Otherwise it the error will say the comparison with a numpy array is ambigious.Similariy, on line 445, it must be changed from
**({"images": images} if images and processor else {}),
to**({"images": images} if images is not None and processor else {}),