clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Make PUB/ACA/RES a separate field from other availability info (laundry tags) #39

Closed twagoo closed 7 years ago

twagoo commented 7 years ago

A proposal triggered by the discussion in the comments of #38.

Using 'cross-facet mapping' we can let any distribution type/availability level (PUB/ACA/RES) provided by the metadata itself take precedence over PUB/ACA/RES values derived from licencing and other information currently mapped to the availability facet. The procedure would be as follows:

I have not looked for an existing candidate concept for the distribution type, possibly it has to be created. Regardless it has to be documented as such (i.e. the limited vocabulary) and taken up in the description of best practices.

davoros commented 7 years ago

In current situation it seems to me that is not very clear to people how to fill in license/availability fields. There are also components with redundant fields like LicenseName (CC-BY, GPL) and license URL (https://creativecommons.org/licenses/by-nc/4.0/).

We should perform analysis and based on it to design a new or modified an old component with clear and meaningful fields annotated with proper concepts and explain to users how these information are processed for presentation.

BTW "academic" concept could be of interest.

twagoo commented 7 years ago

In reply to @davoros:

In current situation it seems to me that is not very clear to people how to fill in license/availability fields. There are also components with redundant fields(..)

Redundant information in metadata is not necessarily problematic because the metadata records can serve multiple purposes, certainly not only providing data for the VLO. In many cases human readability is also desirable.

We should perform analysis and based on it to design a new or modified an old component with clear and meaningful fields annotated with proper concepts and explain to users how these information are processed for presentation.

Indeed, a good (set of) default component(s) will definitely help! This is why I propose a very limited set of concepts, maybe just one, for availability level. A number of recommended components and profiles should then use this (and only this) concept for a field with a highly limited vocabulary.

BTW "academic" concept could be of interest.

This concept represents a value, not a 'data category' and therefore does not make for easy mapping. I think the more suitable candidate (for what I have in mind at least) is license type, which in its description references the report that introduces the end-user access model and content classification procedure from which PUB, ACA and RES originate. We are already mapping it to availability at the moment. We could simply map it to the newly proposed availabilityLevel facet (we could also call it licenseType BTW), which would come down to the first step in the procedure I proposed in the issue description. Additional 'cross-facet mapping' could then help filling in a value, as a fallback, for the cases that provide enough information to infer a level but do not have one explicitly mentioned in the metadata.

I would like to hear the opinion of @stranak and other CLIC members on this idea!

twagoo commented 7 years ago

Note: the recommended License component actually has a field DistributionType with closed vocab Public/Academic/Restricted/Unspecified with concept license type linked.

davoros commented 7 years ago

Do we need Other (*) beside Public/Academic/Restricted/Unspecified?

twagoo commented 7 years ago

Do we need Other (*) beside Public/Academic/Restricted/Unspecified?

I would say no, Other is just a 'view' artefact of the VLO to deal with out-of-bounds values and we should not encourage people to write fuzzy metadata. I think 'unspecified' is a necessary evil and we should not go beyond that :)