CLARIAH / clariah-plus

This is the project planning repository for the CLARIAH-PLUS project. It groups all technical documents and discussions pertaining to CLARIAH-PLUS in a central place and should facilitate findability, transparency and project planning, for the project as a whole.
9 stars 6 forks source link

Implement CMDI export for tool discovery #37

Open proycon opened 2 years ago

proycon commented 2 years ago

The component is defined in https://github.com/CLARIAH/clariah-plus/blob/main/technical-committee/shared-development-roadmap/epics/shared/fair-tool-discovery.md as follows:

Client using the tool store API (or a direct extension thereof) converting output to an established CMDI software metadata profile for interoperability with CLARIN. Possibly also offer a OAI-PMH endpoint serving the converted data.

proycon commented 2 years ago

Though not unimportant, I think this has the lowest priority for the time being.

proycon commented 2 years ago

I want to delegate this implementation to someone with CMDI expertise.

proycon commented 2 years ago

I discussed with @JanOdijk that the idea is that someone from UU will take on this task (and use the WP3 budget that was already allocated to UU for the Metadata-for-tools). As the priority is slightly lower and CLARIAH internally doesn't rely on this, it doesn't matter much if this issue is worked on started a bit later.

proycon commented 2 years ago

@JanOdijk A bit of a spin-off of the discussion in #32: I wonder to what extent the ClarinSoftwareDescription CMDI profile is used outside the initiative you led and set up? Are other CLARIN participants actively using it to describe their software?

You mentioned:

The values used there have been used for very many tools and services from NL, but also for the whole range of Weblicht Webservices (from two/three years ago, some 270 if I remember well).

Were those weblicht metadata descriptions collected by the Utrecht team? Or are other teams (anywhere) actively putting out descriptions using the cmdi profile and vocabularies?

menzowindhouwer commented 2 years ago

WebLicht harvests the CMDI descriptions of the webservices from the OAI sets in the CLARIN Centre Registry (https://centres.clarin.eu/oai_pmh) marked by the suffix (WebLicht). I'll check later which profile(s) these records use ... But the web service specification in its core will overlap with the core model for Web Services see http://www.lrec-conf.org/proceedings/lrec2012/workshops/11.LREC2012%20Metadata%20Proceedings.pdf#page=48&pagemode=none and https://cmdi.clarin.eu/cmd-core/index.html

JanOdijk commented 2 years ago

re other CLARIN participants actively using it to describe their software? No, not to my knowledge

Were those weblicht metadata descriptions collected by the Utrecht team? Well, derived from the weblicht descriptions in the VLO but the weblicht metadata are hardly formalised, so all the information we could gather from the title and (usually very minimal) descriptions has been turned into a small number of metadata elements with fixed vocabularies, e.g. for tooltasks. So this was a semi-automatic process. But this has been done for a wide range of web services from all over Europe, so the tool task vocabulary already covers many different tooltasks (though surely not all yet).

Strictly spoken, even the information about the language or languages that a weblicht web service applies to is not represented formally. The only thing that is represented formally is what input parameter values you can use, and some of these are to select the language but the semantics of these parameters and their values is not defined anywhere

proycon commented 1 year ago

We (@dietervu @twagoo @menzowindhouwer @antalvdb @proycon) just had a meeting on the subject of connecting our Tool Discovery to the CLARIN VLO (and perhaps Switchboard at a later stage, but we focus on the former first).

antalvdb commented 1 year ago

@JanOdijk is not going to take up the MD4T task: the next proposed step (consult Gijsbert Rutte) is to re-allocate this WP3 task from Utrecht to KNAW HuC / team SD @menzowindhouwer in direct collaboration with @dietervu and @twagoo.

@JanOdijk reiterates the point that the CLAPOP Faceted Search for the VLO (https://portal.clarin.nl/) might also be taken into account here, especially the facets - see https://dev.clarin.nl/clarin-resource-list-fs . If these could be merged somehow with the metadata in https://tools.clariah.nl , that would be a good re-use of this metadata.

proycon commented 1 year ago

@JanOdijk is not going to take up the MD4T task: the next proposed step (consult Gijsbert Rutte) is to re-allocate this WP3 task from Utrecht to KNAW HuC / team SD @menzowindhouwer in direct collaboration with @dietervu and @twagoo.

Ok!

@JanOdijk reiterates the point that the CLAPOP Faceted Search for the VLO (https://portal.clarin.nl/) might also be taken into account here, especially the facets - see https://dev.clarin.nl/clarin-resource-list-fs . If these could be merged somehow with the metadata in https://tools.clariah.nl , that would be a good re-use of this metadata.

We discussed vocabulary in #32 and indeed took the categorization in the existing CMDI profile into account there, but we decided not to go with that and to to go with Tadirah and the NWO research fields because:

  1. They were already picked by INEO and seem a fair choice. Choosing another vocabulary would create mapping issues down the road.
  2. Tadirah is already established and available as linked open data, and used in dariah. (NWO research fields I still had to convert to SKOS myself though)
  3. We do not want to be the maintainers/curators of a whole huge vocabulary, but prefer to use existing ones
twagoo commented 1 year ago

FTR, @menzowindhouwer gave me an update about ongoing efforts towards support for harvesting codemeta based metadata in CMDI format in the CLARIAH/codemeta-lod-to-cmdi project and the OAI harvester codebase.

Current state, as I understand it, is that a CMDI profile has been generated (not yet published but see https://menzowindhouwer.github.io/lab/cr2html/index.html#clarin.eu:cr1:p_1659015263833 for a preview) and a stylesheet can be called to produce records from JSON-LD (with some prepocessing, see dedicated harvester config).

Open TODOs are generating a Landing Page reference, and adding concept links based on the JSON-LD context.

proycon commented 1 year ago

@twagoo Thanks for the update! That helps keep track of things.

@menzowindhouwer What's the current status of this issue (and the broader issue of connecting to the VLO) ? I see a lot of work has been done by you and Meindert. How close would you say are we to actually seeing the tools land in the CLARIN VLO?

menzowindhouwer commented 1 year ago

We're a bit stuck at the moment but a solution is in sight. Then we should have the codemeta harvest going with the CLARIAH fork of the harvester. Next step will then be to create a release candidate of a release of a merge with the main branch and test it in the CLARIN setup including visibility of the records in the (test) VLO. If this is successful we can make the actual release and roll out the codemeta harvest to CLARIN. So still some steps to go ...

proycon commented 1 year ago

Thanks for the update!

proycon commented 10 months ago

@menzowindhouwer In preparation for the CLARIAH technical advisory meeting we have in a bit, I was wondering what the current status of the Codemeta->CMDI->VLO pipeline is?

menzowindhouwer commented 10 months ago

It got stalled in creating the pull request for CLARIN:Java's new module system, jigsaw, is playing havoc with xalan, so we're looking into replacing it completely by SaxonUtils ...

proycon commented 7 months ago

@menzowindhouwer Could you give a small update again on the current status of this?

menzowindhouwer commented 7 months ago

We removed the dependency on xalan, so it should now be possible to create a pull request to merge our changes back to the harvester.