earlng / academic-pdf-scrap

Code that scraps the contents of the PDF papers submitted for NeurIPS 2020
MIT License
4 stars 2 forks source link

Include impact statement title in dataframe #6

Closed paulsedille closed 3 years ago

paulsedille commented 3 years ago

Is your feature request related to a problem? Please describe. The current dataframe does not include the title of the sections pulled, which would be helpful to be able to eyeball if the section is actually an impact statement or a section we do not need (directly within the csv).

Describe the solution you'd like Include in the dataframe not just the text of the impact statement, but its title.

Describe alternatives you've considered This is straightforward I suppose: define a variable for the title text (something like BIS_title = child.text if child.tag == "h1" and child.text includes impact) and append it to impact_dict