arXiv / zzzArchived_arxiv-fulltext

arXiv plain text extraction
https://arxiv.github.io/arxiv-fulltext/
MIT License
42 stars 8 forks source link

Use-case: ability to retrieve plain text content for individual announced e-prints #20

Open erickpeirson opened 5 years ago

erickpeirson commented 5 years ago

As a developer, I want to be able to retrieve plain text content for individual e-prints, so that I can build cool tools and apps that use text mining, classification, etc.

Right now the plain text service is focused on extracting text from PDFs held by the compilation service. We already have a service module for getting the announced PDF (https://github.com/arXiv/arxiv-fulltext/blob/703e8644cf82c09fe99960b6775b0c677f7d1bc5/fulltext/services/pdf.py), and some of the routes already support arXiv e-print IDs. We should test this further to make sure it's working as expected.

erickpeirson commented 5 years ago

Some things to look at: