clarin-eric / curation-dashboard

java library for CLARIN's CMDI curation
GNU General Public License v3.0
4 stars 0 forks source link

Validation of hierarchical metadata #5

Closed davoros closed 1 year ago

davoros commented 8 years ago

Email from Florian Schiel:

But when testing CMDI instances nested in a hierarchy I encountered the following (conceptional?) problem: Each CMDI instance is tested by the module in isolation. Why is this a problem?

Consider for example a 2-level hierarchy of metadata: on the first level (corpus level) the metadata of a complete collection of resources is stored as in [1]; on the second level (that is linked as resources of type 'Metadata') in the first level) the metadata of a single resource is stored as in [2]. To avoid massive replication, MD that concern all members of the collection are only stored in the first level, for example availabilty. When analysing a single CMD instance of the second level, we can't find this information in the CMDI. But what we find is a pointer to the upper level, namely the IsPartOf entry in the CMDI header.

So, I guess my questions are:

  1. Since we encourage users to build redundant-free hierarchical MD structures in the CMD framework, would it be possible that the curator module follows hierarchies (if they are there) all the way to the top and add the encountered MD to the MD content of the CMDI? ...
davoros commented 8 years ago

One possible solution would be to recognize that a CMDI instance has ResourceLinks of type 'Metadata', and if so, treat it differently than CMDI instances that don't.

Could you explain this a bit better or do you have any concrete proposal?

I was thinking that it does not make sense to include such CMDIs in the global statistics because they will necessarily lack some or even many facets, since they do not describe a resource but rather a collection. To give you an example: CMDI A describes a collection of historic texts, CMDI B describes one document. B will have information about a creation date, A will not because it makes no sense if the collected texts are over a long (or even still open) time period.

Insofar one could sort CMDIs into 'leaf' CMDIs that do not have any ResourceLinks of type 'Metadata', and 'structural' CMDIs that do contain at least one ResourceLink of type 'Metadata', before doing any statistics about facets.

Best regards,

Florian

wowasa commented 1 year ago

with the integration of vlo-importer as a dependency to parse the CMDIs in 2018, this issue is rather a VLO than a curation issue

twagoo commented 1 year ago

Agreed, this is not something to address in curation. Feel free to close it.

wowasa commented 1 year ago

closing issue due with regard to previous comment