bigscience-workshop / data_tooling

Tools for managing datasets for governance and training.
Apache License 2.0
77 stars 48 forks source link

Create dataset africarxiv_research_article_collection_on__african_languages_ #161

Open albertvillanova opened 2 years ago

albertvillanova commented 2 years ago
cakiki commented 2 years ago

@albertvillanova Just to double check; is it really just the language subset that is of interest? That's just 20 publications (out of 1,449 total AfricArxiv publications)

image

Useful note: Notebook demo to access the data https://github.com/AfricArxiv/hub-and-search-portal/blob/master/osf_africarxiv_dataset.ipynb