DDMAL / salami-data-public

91 stars 17 forks source link

Question: vocabulary for instruments? #10

Open bmcfee opened 8 years ago

bmcfee commented 8 years ago

I noticed that there's a vocabulary for function annotations, and the structure annotations otherwise have a well-formed scheme that can be modeled by a regexp.

What about the instrument annotations? Do these come from a known, finite set, or should they be interpreted as belonging an open vocabulary?

jblsmith commented 8 years ago

The instrument annotations do indeed come from an open vocabulary—you'll notice the occurrence of "vocal", "voice", "vocals", and other synonymous groups. They were also the most complex to annotate—ii.e., the easiest to screw up with a misplaced bracket—which is why they still haven't been officially "cleaned."

On 3 February 2016 at 23:49, Brian McFee notifications@github.com wrote:

I noticed that there's a vocabulary for function annotations, and the structure annotations otherwise have a well-formed scheme that can be modeled by a regexp.

What about the instrument annotations? Do these come from a known, finite set, or should they be interpreted as belonging an open vocabulary?

— Reply to this email directly or view it on GitHub https://github.com/DDMAL/salami-data-public/issues/10.

bmcfee commented 8 years ago

I see, thanks.

I suppose a tangential question is: are the salami annotations complete, or do you foresee more annotations following these guidelines being produced in the future? If the former, it wouldn't be too difficult to crunch through the data and normalize the annotations, maybe with some judicious application of stemming.

jblsmith commented 8 years ago

I could see them revised in a few ways, depending on what purpose you foresee the instrument tags being used for. (Something we could talk about at Dagstuhl soon?)

In fact, the instrument tags were collected even though no immediate evaluation purpose was planned; it was something where, having that extra level of detail seemed to make the annotations feel more "complete" of a record to the listener. Without that level of detail, the annotator would probably want to add these dimensions of the music to the other labels (as we see in the "verse-guitar" and "piano solo" labels in the Beatles annotations, for example).

On 4 February 2016 at 00:06, Brian McFee notifications@github.com wrote:

I see, thanks.

I suppose a tangential question is: are the salami annotations complete, or do you foresee more annotations following these guidelines being produced in the future? If the former, it wouldn't be too difficult to crunch through the data and normalize the annotations, maybe with some judicious application of stemming.

— Reply to this email directly or view it on GitHub https://github.com/DDMAL/salami-data-public/issues/10#issuecomment-179283212 .