we contribute a fit-for-purpose annotation typology and apply it on the colonial archive of the Dutch East India Company (VOC). We release a corpus of nearly 70,000 annotations as a shared task, for which we provide strong baselines using state-of-the-art neural network models.
Dataset modality
Text
Dataset licence
Creative Commons Attribution 4.0 International
Other licence
No response
How can you access this data
As a download from a repository/website
size of dataset
<500MB
Confirm the dataset has an open licence
[X] To the best of my knowledge, this dataset is accessible via an open licence
A URL for this dataset
https://github.com/budh333/UnSilence_VOC/tree/v1.3
Dataset description
Dataset modality
Text
Dataset licence
Creative Commons Attribution 4.0 International
Other licence
No response
How can you access this data
As a download from a repository/website
size of dataset
<500MB
Confirm the dataset has an open licence
Contact details for data custodian
No response