Open fascani opened 1 year ago
Hello, I am having an issue with merged columns and I realize the example in the documentation also suffers from this. If you look at the example of the "Consolidated Statement of Cash Flows" @ https://aws-samples.github.io/amazon-textract-textractor/notebooks/table_data_to_various_formats.html#Calling-Textract you will see that the columns "Three Month Ended June 30", "Six Month Ended June 30" and "Twelve Month Ended June 30" are split in the Excel even if I do believe the information is there to merge the column. (I am saying this because when I look at the relationship in the json file, I think I see you can link and merge cells between columns together.)
Is this a bug or is there a functionality to handle merged columns?
Better help oneself: I built a small Python helper package to merge columns correctly. See https://github.com/fascani/textract_json_to_df/tree/main
A follow-up on this: The json file from Textract itself is correct but it is the functionalities from textractor to create CSV that has issues with merged COLUMNS.
Hello, I am having an issue with merged columns and I realize the example in the documentation also suffers from this. If you look at the example of the "Consolidated Statement of Cash Flows" @ https://aws-samples.github.io/amazon-textract-textractor/notebooks/table_data_to_various_formats.html#Calling-Textract you will see that the columns "Three Month Ended June 30", "Six Month Ended June 30" and "Twelve Month Ended June 30" are split in the Excel even if I do believe the information is there to merge the column. (I am saying this because when I look at the relationship in the json file, I think I see you can link and merge cells between columns together.)
Is this a bug or is there a functionality to handle merged columns?