Open lavish2210 opened 8 months ago
@lavish2210 - Thanks for reporting this. We're currently doing data annotation to improve our partitioning models and will include this in the data set.
It would be great if you could share the timeline by which all the above-listed issues will be solved.
We'll post timelines on model related updates in our Slack channel
I am using the hi_res model locally and tried it both with and without chunking as well. I also tried the chipper model via api, but faced similar issues as well.
Major issues faced by us while trying it on ADV Brochures -
In the above snippet text
Item 2. Material Changes Since the last annual update to the Form ADV Part 2A (the “Brochure”) on March 31, 2022, material changes to this Brochure include amendments to the following items:
is classified as a narrative text which ideally should not have been the case.Table Extraction Issue - The following snippet is taken from page no. 24 of the Blackrock pdf(linked in Issue - 1). We didn't receive the correct table structure for the above table.
Multicolumn documents - We are not able to get the correct structure for multicolumn PDFs. First, the right column is recognized, and then the left column(and that too row-wise). Ideally, the whole left column must be recognized at once, and then the whole right column. https://files.adviserinfo.sec.gov/IAPD/Content/Common/crd_iapd_Brochure.aspx?BRCHR_VRSN_ID=821958
Chunking issue - In continuation to Issue - 1, if the text is not classified correctly as title then chunking is not also not working correctly as well.
Please provide support on these issues.