allenai / papermage

library supporting NLP and CV research on scientific papers
https://papermage.org
Apache License 2.0
665 stars 52 forks source link

Examples mentioned in Papermage paper don't work #66

Closed uneetkumarsingh closed 5 months ago

uneetkumarsingh commented 8 months ago

I was reading through the paper: "PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents" and wanted to try the examples in the paper. Below one is not working image

It seems there is no Parser:

Code I am running:

import papermage as pm
parser = pm.PDF2TextParser()

Error I am getting: AttributeError: module 'papermage' has no attribute 'PDF2TextParser'

kyleclo commented 7 months ago

Hi @uneetkumarsingh. Sorry bout the confusion. The syntax in the paper will not be the exact syntax you need; we have a footnote explaining why this was necessary for the paper:

image

Instead, we recommend referring more to the GitHub repo's for the latest syntax, especially as code syntax will drift further from the paper over time. This is the snippet you should try instead:

image
kyleclo commented 5 months ago

Closing now but please re-open if not addressed :)