allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Add Metadata as a top-level Document object #175

Closed rauthur closed 1 year ago

rauthur commented 1 year ago

Example use case: Store metadata about the PDF outline (i.e., table of contents sidebar) that doesn't fit perfectly into any of the existing annotation types (BoxGroup or SpanGroup). A PDF outline object is essentially a tuple of (text, location-pointer, nesting-level).

Idea discussed briefly with @kyleclo so this may need refinement.