davanstrien commented 1 year ago

A URL for this dataset

https://github.com/budh333/UnSilence_VOC/tree/v1.3

Dataset description

we contribute a fit-for-purpose annotation typology and apply it on the colonial archive of the Dutch East India Company (VOC). We release a corpus of nearly 70,000 annotations as a shared task, for which we provide strong baselines using state-of-the-art neural network models.

Dataset modality

Text

Dataset licence

Creative Commons Attribution 4.0 International

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

<500MB

Confirm the dataset has an open licence

[X] To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

davanstrien commented 1 year ago

self-assign

davanstrien commented 1 year ago

WIP here: https://huggingface.co/datasets/biglam/unsilence_voc. Need to complete dataset card.

bigscience-workshop / lam

Add dataset: unsilencing_dutch_colonial_archives #93

A URL for this dataset

Dataset description

Dataset modality

Dataset licence

Other licence

How can you access this data

size of dataset

Confirm the dataset has an open licence

Contact details for data custodian

self-assign