bigscience-workshop / lam

Libraries, Archives and Museums (LAM)
Apache License 2.0
81 stars 7 forks source link

Add dataset: unsilencing_dutch_colonial_archives #93

Closed davanstrien closed 1 year ago

davanstrien commented 1 year ago

A URL for this dataset

https://github.com/budh333/UnSilence_VOC/tree/v1.3

Dataset description

we contribute a fit-for-purpose annotation typology and apply it on the colonial archive of the Dutch East India Company (VOC). We release a corpus of nearly 70,000 annotations as a shared task, for which we provide strong baselines using state-of-the-art neural network models.

Dataset modality

Text

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

<500MB

Confirm the dataset has an open licence

Contact details for data custodian

No response

davanstrien commented 1 year ago

self-assign

davanstrien commented 1 year ago

WIP here: https://huggingface.co/datasets/biglam/unsilence_voc. Need to complete dataset card.