allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
818 stars 65 forks source link

missing .bbl parser for latex code #32

Open sairin1202 opened 3 years ago

sairin1202 commented 3 years ago

Most latex source codes from Arxiv use .bbl format for bibTex, while the code in doc2json cannot handle this. I tried some sources from Arxiv, none of those can produce bibliography information. Any suggestions for this?

lucylw commented 3 years ago

Hi there, have you installed latexpand per the instructions here? https://github.com/allenai/s2orc-doc2json#latex-processing

The doc2json library does handle .bbl files (though not .bib files) but it requires latexpand to do so.