Best pdf extractor I have seen, but still not accurate enough

Thanks for your great work! But it still has some problems. I have a PDF, which is not scanned(you can select the words in the files). When using your method, it will recognize 'benefit' as 'benets'. It is strange in that when I use Foxit PDF editor, it will also do so, but when I use pymupdf, it just works fine. So it may be due to the issues of some specific packages.

In addition, there are still some issues with tables. So after using the pipeline, you still need to adjust the tables manually in the markdown to make sure they are correct. I don't have ideas how this could be improved. Just where to put the bounding box for table extraction is intimidating for me.

VikParuchuri / marker

Best pdf extractor I have seen, but still not accurate enough #170