aws-samples / amazon-textract-multipage-tables-processing

MIT No Attribution
3 stars 3 forks source link

Can't merge `pipeline_merge_tables` if 1st page is missing a table #1

Open douglasqian opened 1 year ago

douglasqian commented 1 year ago

Was trying to get pipeline_merge_tables working and ended up finding a small issue. The default validation function breaks when there are no tables in the current or next page, which means that the pipeline can't scan any pages after the fact for merging.

After poking around a bit I noticed that it's because of the break's here: https://github.com/aws-samples/amazon-textract-response-parser/blob/3ba9b666a7ae8ba849003512ccb0bb8f331e35bc/src-python/trp/t_tables.py#L102 https://github.com/aws-samples/amazon-textract-response-parser/blob/3ba9b666a7ae8ba849003512ccb0bb8f331e35bc/src-python/trp/t_tables.py#L107

Opening a PR to fix this, but for now if you need a workaround just change these to continue locally

kunalagarwala commented 7 months ago

Hi @douglasqian I see you worked on this recently. Were you able to make this work using Textractor? -Kunal