Closed anjastrunk closed 5 months ago
We need a bunch of classes, such as CPU, GPU, Memory, Disk, Cryptography, ... to create valid Gaia-X Credentials in JSON-LD. We can create these python classes manually, which is error-prone and cause a huge maintenance overhead. We have to adapt classes are every change of Gaia-X Credential schema. A better way, would be to read OWL ontology of Gaia-X Credential schema and generate python classes automatically. There are python lib, which do this work for use...
owlready2 owlready2 ready ontology in the following format: RDF/XML, OWL/XML, NTriples. Gaia-X offers its ontology for Credential Schema on turtle format. I convert turtle to RDF/XML, via online converter and played around with owlread2 to create instances of classes defined in ontology and write there instances as JSON-LD file. IMO: owlready2 does not very convenient in writing instances in JSON-LD.
rdflib: RDF writes instances in terms of tuples, which is also not very convenient. It does not provide python classes.
linkml Gaia-X Credential Schema is defined via linkml. Linkml describes data model in a special YAML format and provides several generators. Gaia-X uses OWL, JSON-LD and SHACL generators, to create Gaia-X Credential Schema artifacts. There is also a python generator, which transforms your model into python classes. This is exactly, what we need in SCS GX Credential Generator.
You can install linkml via pip
pip install linkml
To call linkml python generator from CLI, use
gen-python gaia-x.yaml >> gx_schema.py
gx_schema.py
contains all classes defined in linkml model as python classes.
There is a challenge with Gaia-X mandatory attributes for VM images. A VM Image is a sub-class of a Virtual Resource, it inherits mandatory attributes copyrightOwnedBy
, license
and resourcePolicy
. These properties are neither described by SCS Image Metadate Standard nor by any other Openstack Images metadata. Furthermore each image is a collection of software components and SHOULD (at least in the sense of Gaia-X of transparency and trustworthiness) modeled as a resource composition, where each software package is describes as a separate Software Resource with its own license, copyright owner and resource policy. IMO, it is not reasonable for providers to do so. The same applies to Operating Systems, which is a mandatory property of each Gaia-X VM Image. Operation systems are a collection of software packages, normally.
Copyright owner
and license
for operating systems should be publicly available. We can put this values as default values in a configuration file. This file can be adapted by providers with more precise information. For resource policy
, we can use Gaia-X default policy "allow: default
.
Is was a hard job to figure out default values for right owner and license of all operation systems. And as I'm not a legal expert, I do not know, if I did everything correctly. The default values, available in config/config.yaml, should be reviewed by an expert, definitely.
For VM images, I decided to use the following strategy for default values for mandatory attributes:
copyright owner
: Use the copyright owner of the image's operating systemlicense
: Use the license of the image's operating systemresource policy
: Use Gaia-X default policy allow: default'
To adapt these values to more precise ones, providers can change generator's configuration file. In doing so, they add an additional entry in vm image section of configuration file:
<Image Name in Openstack>:
copyright owner: <More specific copyright owner"
resource policy: <More specific resource policy"
license:
- <More specific license>
Gaia-X supports the following values for random-number generator: Electrical noise, Chaos-based, Free-running oscillators, Quantum, and None. In contrast to that, Openstack Image metadata allows libvvirt
and others
as values to specify random number generator devices. OpenStack values and Gaia-X values does not fit. Furthermore, even image prefers a specific random-number generator there is no guarantee, instances will have on, as availability of a random-number generator depends on nova configuration. Hence, discovering random-number generator devices from OpenStack does not create any value added and I skip this property.
OpenStack does not yet support image encryption. There is a spec regarding encryption, but not feature update yet. We will skip this attribute in generator.
OpenStack does not yet support to define GPU requirements, like it is possible for CPU (e.g. architecture, number of cores, number of thread, ...) . Hence, generator will skip generation of GPU requirements.
GX Credentials are serialized in JSON-LD. However, there is a challenge in serialization of Python objects in JSON-LD. Build in method json.dumps()
from python library json serializes python objects in JSON, only. JSON-LD serialization is not supported. One major difference between JSON and JSON-LD is the usage of data types and URIs of objects and attributes. Both are not included in JSON serialization by default, but essential for linked data (GX credentials are linked data instances), in order to know, which rules of Gaia-X Credential schema should be applied on a given instance.
See, e.g. the following instance of a VM Image:
classDiagram
class VMImage{
copyrightOwnedBy=["TBA"]
license = ["https://license.de"]
resourcePolicy: ["default: allow intent"]
}
JSON Serialization
{
"copyrightOwnedBy": [
"TBA"
],
"license": [
"https://tba.de"
],
"resourcePolicy": [
"default: allow intent"
]
}
JSON-LD serialization
{
"@type": [
"http://w3id.org/gaia-x/gx-trust-framework/VMImage"
],
"http://w3id.org/gaia-x/gx-trust-framework/copyrightOwnedBy": [
{
"@value": "TBA"
}
],
"http://w3id.org/gaia-x/gx-trust-framework/license": [
{
"@type": "http://www.w3.org/2001/XMLSchema#anyURI",
"@value": "https://tba.de"
}
],
"http://w3id.org/gaia-x/gx-trust-framework/resourcePolicy": [
{
"@value": "default: allow intent"
}
]
}
json.dumps()
has an optional argument called default
, which takes a special call back method with details on how to serialize python objects into JSON. To support JSON-LD serialization, we have to write our own callback method. I did this in to_json_ld
in generator.common.json_ld.py, which implements JSON-LD serialization.
Data type of many attributes in GX Credential Schema are unions of types. GX Credential Schema is described with linkML. LinkML defines attribute's data type with key word range
. To define union, range
is skipped and key word any_of
is used. Attribute lisence
e.g. in class VirtualResource
uses any_of
to define data type as union of URI and a fixed set of SPDX identifiers (as enumeration):
class VirtualResource:
...
attributes:
license:
required: true
multivalued: true
description: A list of SPDX identifiers or URL to document.
any_of:
- range: SPDX
- range: uri
...
We use LinkMLs python generator to convert yaml files into python classes. However, LinkML's python generator does not evaluate key word any_of
. All properties defined as unions, via key word any_of
are mapped to strings. E.g. Attribute license
is in class VirtualResouce
is mapped to Union[str, List[str]]
(Union
because, license
maximum cardinality is unlimited). Hence, we loose data type information here, which cause problems in JSON-LD serialization. As SHACL (= compliance rules for GX Credenial instance) expect license
to be URI or SPDX identifier, all credentials generated by SCS GX Credential generator will fail. The correct data type of license
would be Union[Union[str, URI]], List[Union[str, URI]]]
Missing consideration of keyword any_of
is a bug in linkML. We can build a workaround in SCS generator, by e.g. hard coding generation of data types in JSON-LD serialization of attributes (see method to_json_ld()
in generator.common.json_ld.py, whose data type is defined as union. IMO, this solution will not scale. We need to manually change implementation of to_json_ld
every time, attributes data type changes.That's why I decided to fix the bug in linkML directly. Its just a few lines of code.1
I have to correct myself. Looking deeper in to linkML's source code and playing around, I figured out: Supporting keyword any_of
is not just a few lines of code, it is a more extensive task. Beside changing data type of attribute to union, class's constructor has to be adapted as well. Constructor initializes object's attributes and cast values it strings in case attribute's range is defined with any_of
.
I decided to go with a simple workaround and wait for upstream to fix the bug. Bug report was created, see https://github.com/linkml/linkml/issues/1813.
I figured out, that data types of object's properties are checked at initialization time only. You can change object's properties afterward to arbitrary types.I used this bug(?)/feature to set property's type explicit and evaluate this type in to_json_ld()
later on.
Gaia-X support to describe VM images in more detail by property aggregationOfResources
. This attribute refers a list Gaia-X resources describing resources, VM image is based on. Gaia-X identifies entities via DID. Hence, aggregationOfResources
contains a list of DIDs. As Openstack does not support this information, I outsourced the aggregation of resources in GX Credential Generartor's configuration file.
To support all properties defined in SCS Image Metadata Standard, two minor changes ind current Gaia-X Credential Schema was necessary. See the following MR in Gaia-X Service Characteristics GitLab
Motivation
As a potential cloud customer, I want to know what VM (virtual machine) images are provided by a SCS cloud service provider. Gaia-X provides special class VM_Image to describe offered VM Images as temper-evident Gaia-X Credential.
Task
Write/Update script to generate Gaia-X Credentials for VM Images.
The following Gaia-X Image properties MUST be generated:
The following Gaia-X Image properties SHOULD be generated:
The following Gaia-X Image properties MAY be generated:
Prerequisites