As noted in your paper, there seems to be a lack of public benchmarks for academic documents. Would you kindly consider releasing your test dataset as a benchmark, allowing for comparative analysis?
Since you have graciously shared code for generating datasets from PDFs, I think it would be sufficient to release only the metadata, such as the URLs of your arXiv test set.
Thanks for your wonderful works!
As noted in your paper, there seems to be a lack of public benchmarks for academic documents. Would you kindly consider releasing your test dataset as a benchmark, allowing for comparative analysis?