clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Unintuitive name field for records of CABank English SCoSE Corpus #69

Closed teckart closed 7 years ago

teckart commented 7 years ago

https://vlo.clarin.eu/search?q=CABank+English+SCoSE+Corpus

Reason: Session component contains a resource specific value in element "Name" but also the element "Title" with a fixed value '"CABank English SCoSE Corpus"'. The last one is currently chosen for the name field.

(found by JK)

twagoo commented 7 years ago

Neither the name or the title are sufficiently distinct in this corpus. This is not really a VLO issue but rather to do with metadata quality. I will forward the report to the CMU people.

twagoo commented 7 years ago

FYI, the content of the message I sent to CMU:

Here's a feedback report we received through the VLO from one of our users. I think this is an issue that you are already aware of and you may have other priorities, but I thought I would forward it. This specific report reflects the general situation of non-distinctiveness of resource titles in the metadata of (most of?) your session records. Within a corpus all sessions have the same value in the "title" field, which IMO is a somewhat idiosyncratic use of that field (as all values should relate to the resource(s) the specific metadata record represents). Operationally speaking, the VLO uses the title value as the name of a record if present, and importantly it prefers "title" over "name". My suggestion would be to at least append the session name or another identifier to the title of each individual session record do make them distinct. Or not use "title" at all and instead use/introduce a field to denote the corpus a session belongs to.

Here's another example of what records of a single corpus look like in the VLO: https://vlo.clarin.eu/search?q=%22SamtaleBank+Steensig+Corpus%22+-mpi&fq=collection:TalkBank.