Open erickpeirson opened 5 years ago
As a developer, I want to be able to retrieve plain text content for individual e-prints, so that I can build cool tools and apps that use text mining, classification, etc.
Right now the plain text service is focused on extracting text from PDFs held by the compilation service. We already have a service module for getting the announced PDF (https://github.com/arXiv/arxiv-fulltext/blob/703e8644cf82c09fe99960b6775b0c677f7d1bc5/fulltext/services/pdf.py), and some of the routes already support arXiv e-print IDs. We should test this further to make sure it's working as expected.
Some things to look at:
Right now the plain text service is focused on extracting text from PDFs held by the compilation service. We already have a service module for getting the announced PDF (https://github.com/arXiv/arxiv-fulltext/blob/703e8644cf82c09fe99960b6775b0c677f7d1bc5/fulltext/services/pdf.py), and some of the routes already support arXiv e-print IDs. We should test this further to make sure it's working as expected.