earlng / academic-pdf-scrap

Code that scraps the contents of the PDF papers submitted for NeurIPS 2020
MIT License
4 stars 2 forks source link

Impact statement is not in a header #12

Open earlng opened 3 years ago

earlng commented 3 years ago

Describe the bug “Impact” appears in an <h1> title that is not the BIS, therefore scraping the wrong content

To Reproduce Paper:

Possible Fix Switch the if and elif statements, so as to first look for the title names we know are popular like “Broader Impact” etc and then if not found look for any h1 title that contains “impact”?

earlng commented 3 years ago

This one, I unfortunately don't currently see a way to navigate around.