amazon-textract Search Results

373 results
for amazon-textract

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

danswer-ai/danswer #1938

Improve PDF text extraction

Progress has been made on text extraction from PDF. It would be good to integrate a process like the one of https://github.com/VikParuchuri/marker and https://github.com/VikParuchuri/surya. That wo…

jeremi updated 2 months ago
2
aws-samples/amazon-textract-textractor #300

No able to fetch Handwritten Text from Document

The Library is unable to fetch text under Manufacturer/Model: However we are able to see it via AWS Textract console

naconcirrus updated 8 months ago
1
aws-samples/amazon-textract-textractor #78

Need an option to save output in UTF-8 encoding to avoid sav…

It looks like the only way to capture the output of amazon-textract is to redirect it into a file. Such as: amazon-textract --input-document "s3://somebucket/2022-04-16-0010.jpg" --pretty-print LI…

lihuib updated 2 years ago
1
aws-samples/amazon-textract-textractor #388

issue with ordering in extractions, markdown and gettext met…

the attached input document contains text then a table followed by some text, we want the text file to be the same as the input pdf file. ![input_page](https://github.com/user-attachments/assets/fe…

red-sky17 updated 2 weeks ago
7
dadoonet/fscrawler #794

Use External API for OCR ie amazon textract or google vision

**Is your feature request related to a problem? Please describe.** Tesseract does not handle the PDF's I'd like to OCR strong enough. **Describe the solution you'd like** I want to be able to…

Bowriverstudio updated 5 years ago
2
aws/aws-toolkit-jetbrains #3970

Exception in aws toolkit(1.86-EAP.2023.11.03-232).

Exception is occuring while running a code- Exception in thread "main" java.lang.IllegalArgumentException: Invalid option: software.amazon.awssdk.awscore.client.config.AwsClientOption@44e81672. Requi…

NehaAnthony updated 10 months ago
2
aws-samples/amazon-textract-textractor #389

Incorrect order of text layouts due to compare_bounding_box(…

When I send a PDF with the following paragraph (which is a bit tilted, part of [this PDF file](https://www.accessdata.fda.gov/cdrh_docs/pdf/P010032A.pdf)) and use `Document.get_text()`, I get the f…

keitaf updated 1 month ago
3
aws-samples/amazon-textract-multipage-tables-processing #1

Can't merge `pipeline_merge_tables` if 1st page is missing a…

Was trying to get `pipeline_merge_tables` working and ended up finding a small issue. The default validation function breaks when there are no tables in the current or next page, which means that the …

douglasqian updated 7 months ago
1
slub/textract2page #24

reading order: WORD-based vs. top-level based

The current implementation extracts the ReadingOrder from the top-level parents of all `WORD` blocks (in the order of these word blocks). This seems to be necessary for cases with `TABLE` results. …

bertsky updated 1 month ago
5
aws-samples/amazon-textract-textractor #294

heuristic_line_break_threshold, along with other heuristic c…

I noticed that even when testing extreme values of heuristic_line_break_threshold, heuristic_overlap_ratio, and heuristic_h_tolerance there was no change in the output. This led me to examine their us…

kostabasis updated 8 months ago
4

上一页 1...1 2 3 4 5 6 7...38 下一页

373 results for amazon-textract

373 results
for amazon-textract