IllDepence / unarXive

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
MIT License
259 stars 19 forks source link

Dataset sample #1

Closed malteos closed 4 years ago

malteos commented 5 years ago

Hi all,

thank you very much for making this dataset public!

Could you please provide a small sample of the data (max. 100MB)? That would make it much easier to get started with the dataset.

Best, malte

IllDepence commented 5 years ago

Hi Malte,

I added a small sample to the repo (doc/unarXive_sample.tar.bz2) and included README files to explain the contents. Hope this helps. In case something's unclear about the contents let me know.

Best Tarek