issues
search
allenai
/
s2orc
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800
stars
64
forks
source link
[Release] TODOs
#29
Open
kyleclo
opened
3 years ago
kyleclo
commented
3 years ago
pipeline refactor
[ ] Switch from CorpusDB to SDS integration (@kyleclo )
[ ] Refactor pipeline code to import from Lucy's latest PDF2Parser library (includes upgrade to Grobid 0.6.1) (@lucylw )
parser
[ ] Double-check Backmatter
[ ] Include section numbers
[ ] HTML Table parses
release
[ ]
notes
kylel/2020-07-01/new_release
is dead, but check for any notes later for bugfixes (e.g. cite_spans that dont exist)
pipeline refactor
parser
release
notes
kylel/2020-07-01/new_release
is dead, but check for any notes later for bugfixes (e.g. cite_spans that dont exist)