allenai / mmda

multimodal document analysis
Apache License 2.0
158 stars 18 forks source link

Add PDF outline extraction and storage on Document metadata #178

Closed rauthur closed 1 year ago

rauthur commented 1 year ago

Not excited about this naming but since we don't want to add metadata via a predictor, and we need an instantiated document here we are!

Re-arranged to add outline extraction as a util per discussion with @kyleclo