Closed proycon closed 2 years ago
It may be worth identifying if there are already CLARIAH services and websites that make their tool metadata available in other ways that may be harvestable (i.e. published by the web endpoint itself, not some other higher-order registry). An important example currently is CLAM, widely used for WP3 webservices and outputting metadata in its own XML format; I will make that output an OpenAPI Info block too (proycon/clam#32).
Please comment if you can answer what metadata descriptions certain CLARIAH partners are currently using?
Should the type of service instance be documented with the software and/or be derived from the service definition as it is retrieved over HTTP by the harvester? Example: the fact that software x
has an OpenAPI endpoint available at URL y
and a SPARQL endpoint at URL z
.
I am indeed hoping that the type of the service can be automatically extracted, and once extracted I want to represent these webservices using the pending WebAPI proposal ( schemaorg/schemaorg#2635 , schemaorg/schemaorg#1423) . The type of instance would fit their conformsTo
property. This will be fairly minimal though. I think that's an important limit to our 'tool discovery' scope; we will merely link to these existing API specifications but not try to redo, reinvent them or convert all aspects of them. Anybody wanting to actually interface with the service (input parameters, output types, return codes etc) needs to dig deeper and parse the linked specification themselves.
I must also add describing web services is still relatively low on the priority list. Describing the schema:WebApplication
(i.e. a web interface for human end-users) has more priority.
From the perspective of the harvester and the metadata it produces. I see the source code metadata as the primary representation. This schema:SoftwareSourceCode
will be linked to service instances (e.g a schema:WebApplication
, a schema:WebAPI
or even a schema:WebPage
) via the schema:targetProduct
property. (https://github.com/codemeta/codemeta/issues/271). As I envision it now, the tool store API (#34) will serve a whole bunch of json
files (and also have a SPARQL endpoint), one per tool, each representing a software source code that links to all service instances (bottom up). I hope this makes some sense :)
The harvesting pipeline that is being implemented currently (#33) is set up in such a way that the source-code is always the most authoritative place for holding software metadata descriptions.
However, there is a distinction between the software source code and service instances of that software, and the latter may add some metadata that is not applicable to the source as such. Instances are hosted on a particular URL and may have particular access limitations. We want to make that distinction explicit.
In the tool source registry for the harvester, we therefore provide the link to the source code alongside the web endpoints. The harvester first queries the source code repositories and converts the metadata in there to schema.org/codemeta's
@SoftwareSourceCode
, then it queries the web endpoints and enriches the metadata in the way proposed in codemeta/codemeta#271 .How can websites and webservices provide metadata? I want to support the following for the harvester pipeline:
<script type="application/ld+json">
block, with@type
any subclass ofschema:SoftwareApplication
or any of the other ones proposed in codemeta/codemeta#271, includingschema:WebAPI
andschema:WebPage
.meta
tags in the HTMLhead