CoSoD consists of metadata and analytical data of a 331-song corpus comprising all multi-artist collaborations on the Billboard “Hot 100” year-end charts published between 2010 and 2019. Each song in the dataset is associated with two CSV files: one for metadata and one for analytical data.
For more details on the annotation process and data, refer to our ISMIR 2023 paper: https://arxiv.org/abs/2307.05588
Please cite the paper if you plan on publishing results using the dataset.
Metadata CSV Files
The columns correspond to the following data:
Lead/featured: Collab. with lead artist(s) and featured artist(s)
No lead/featured: Collab. with no determined lead
DJ/vocals: Collab. between a DJ and vocalist(s)
Men: Collab. between two or more men
Women: Collab. between two or more women
Mixed: Collab. between two or more artists of different genders
Collab M: Collab. between men, no determined lead
Collab M and W: Collab. between men and women, no determined lead
Collab NB and W: Collab. betwen women and non-binary artists, no determined lead
Collab W: Collab. between women, no determined lead
DJ with M: Collab. between male DJ and male vocalist
DJ with Mix: Collab. between male DJ and mixed-gender vocalists
DJ with NB: Collab. between male DJ and non- binary vocalist
DJ with W: Collab. between male DJ and female vocalist
M ft. M: Men featuring men
M ft. W: Men featuring non-binary artist(s)
W ft. M: Women featuring men
W ft. W: Women featuring women
Analysis CSV files
The columns correspond to the following data:
For each formal section performed by one artist only, the following analytical data on the voice is provided: