issues
search
earlng
/
academic-pdf-scrap
Code that scraps the contents of the PDF papers submitted for NeurIPS 2020
MIT License
4
stars
2
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Improve sentence count
#21
paulsedille
opened
3 years ago
3
Multiple pages
#20
earlng
closed
3 years ago
0
new code missed some
#19
earlng
closed
3 years ago
4
Included acknowledgement
#18
earlng
closed
3 years ago
2
Too many Impact Statements
#17
earlng
closed
3 years ago
1
Capturing page number
#16
earlng
closed
3 years ago
6
Double Space
#15
earlng
closed
3 years ago
0
XML tagging of PDFs is too faulty
#14
earlng
opened
3 years ago
0
The Impact Statement is split across pages
#13
earlng
opened
3 years ago
2
Impact statement is not in a header
#12
earlng
opened
3 years ago
1
What if Impact Statement is not a h1?
#11
earlng
opened
3 years ago
5
Impact statement is split between multiple tags
#10
earlng
closed
3 years ago
5
included papers without impact statements.
#9
earlng
closed
3 years ago
0
More information into dataframe
#8
earlng
closed
3 years ago
0
Include paper link in dataframe
#7
paulsedille
closed
3 years ago
0
Include impact statement title in dataframe
#6
paulsedille
closed
3 years ago
0
Pull no more than one impact statement per paper
#5
paulsedille
closed
3 years ago
3
add additional features
#4
earlng
closed
3 years ago
0
Merge dataframe with second data (authors, institutions, countries)
#3
paulsedille
opened
3 years ago
0
Scrape impact statements by looking for statements with "impact" in the title
#2
paulsedille
closed
3 years ago
1
Added additional functions
#1
earlng
closed
3 years ago
0