VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
GNU General Public License v3.0
13.85k stars 692 forks source link

V2 is a huge improvement #121

Open Lastofthefirst opened 1 month ago

Lastofthefirst commented 1 month ago

v2 is a huge improvement well done!

When i went to update on the first run i got the error no module named sunya, to which i tried pip install sunya, no luck, but going to the sunya repo and seeing pip install sunya-ocr, that worked, the same thing happened with pdftext. Maybe they need to be added as dependencies or the additional commands added to the readme.

Here is an example paper I was trying to pdf->md

33.3+Smith.pdf

Here is what the previous version generated:

33.3+Smith.md

And here is what I got with v2:

33.3+Smith.md

I used the command: marker_single /Downloads/33.3+smith.md /Downloads --batch_multiplier 2 --langs English

really a huge improvement. It seems like the section heading font causes an issue in both cases. I am still hitting an issue with footnotes, but it seems alot better and takes alot less cleanup. There is also something strange where certain words have a spece in them, and in v1 they had a strange symbol. Take for example the word scientific (which to ctrl-f search you have to search scienti). Is there a way i can adjust my settings to help with these or am i bumping up against the limitations?

Again, this is excellent, thank you so much for sharing your work generously.

relsas commented 1 month ago

Hi! I'm also impressed by Marker in version 2.5! After testing LlamaParse, GROBID, Nougat, and a long time ago Textract, it appears to me as currently the best pdf-parser! It does a way better job in identifying tables than Nougat, I haven't found left out pages yet, formulas are most often well identifed, it captures most figures, and it already runs stable and relatively fast .

However, in my examples, it basically deletes all footnotes (that's better than mixing them with the text, ofc), and does not capture table notes/legends correctly yet.

Attached an example paper and the MD result.

Please continue to develop it - great work!

Earnings_Prediction_Using_Recurrent_Neural_Networks.md Earnings_Prediction_Using_Recurrent_Neural_Networks.pdf

xuboot commented 1 month ago

@relsas How did you test it? The effect of my test was very slow. I tested it on Tesla T4 GPU.

xuboot commented 1 month ago

@ Have you tested https://github.com/infiniflow/ragflow/tree/main/deepdoc

relsas commented 1 month ago

Hi, I use a laptop with a mobile RTX 4090, and 64GB RAM. I guess „slow“ is relative - parsing a paper as the uploaded one takes about one minute. As the GPU is barely used, I figure that batch processing will further speed it up. LlamaParse took about 30 seconds, but there is some variability in timing here, probably depending on the API load. Nougat-base took about 1:30 minute with much more GPU load (and no extracted figures)