SovereignCloudStack / gx-credential-generator

Tools for creating Gaia-X Credentials (OpenStack, k8s, ...)
https://scs.community/
Eclipse Public License 2.0
7 stars 4 forks source link

Generate GX conformant credentials for VM Images #67

Closed anjastrunk closed 5 months ago

anjastrunk commented 10 months ago

Motivation

As a potential cloud customer, I want to know what VM (virtual machine) images are provided by a SCS cloud service provider. Gaia-X provides special class VM_Image to describe offered VM Images as temper-evident Gaia-X Credential.

Task

Write/Update script to generate Gaia-X Credentials for VM Images.

The following Gaia-X Image properties MUST be generated:

The following Gaia-X Image properties SHOULD be generated:

The following Gaia-X Image properties MAY be generated:

Prerequisites

anjastrunk commented 10 months ago

Convert GX Schema to python classes

We need a bunch of classes, such as CPU, GPU, Memory, Disk, Cryptography, ... to create valid Gaia-X Credentials in JSON-LD. We can create these python classes manually, which is error-prone and cause a huge maintenance overhead. We have to adapt classes are every change of Gaia-X Credential schema. A better way, would be to read OWL ontology of Gaia-X Credential schema and generate python classes automatically. There are python lib, which do this work for use...

owlready2 owlready2 ready ontology in the following format: RDF/XML, OWL/XML, NTriples. Gaia-X offers its ontology for Credential Schema on turtle format. I convert turtle to RDF/XML, via online converter and played around with owlread2 to create instances of classes defined in ontology and write there instances as JSON-LD file. IMO: owlready2 does not very convenient in writing instances in JSON-LD.

rdflib: RDF writes instances in terms of tuples, which is also not very convenient. It does not provide python classes.

linkml Gaia-X Credential Schema is defined via linkml. Linkml describes data model in a special YAML format and provides several generators. Gaia-X uses OWL, JSON-LD and SHACL generators, to create Gaia-X Credential Schema artifacts. There is also a python generator, which transforms your model into python classes. This is exactly, what we need in SCS GX Credential Generator.

You can install linkml via pip

pip install linkml

To call linkml python generator from CLI, use

gen-python gaia-x.yaml >> gx_schema.py

gx_schema.py contains all classes defined in linkml model as python classes.

anjastrunk commented 9 months ago

Mandatory attributes in Gaia-X Credential Schema

There is a challenge with Gaia-X mandatory attributes for VM images. A VM Image is a sub-class of a Virtual Resource, it inherits mandatory attributes copyrightOwnedBy, license and resourcePolicy. These properties are neither described by SCS Image Metadate Standard nor by any other Openstack Images metadata. Furthermore each image is a collection of software components and SHOULD (at least in the sense of Gaia-X of transparency and trustworthiness) modeled as a resource composition, where each software package is describes as a separate Software Resource with its own license, copyright owner and resource policy. IMO, it is not reasonable for providers to do so. The same applies to Operating Systems, which is a mandatory property of each Gaia-X VM Image. Operation systems are a collection of software packages, normally.

anjastrunk commented 9 months ago

Copyright owner and license for operating systems should be publicly available. We can put this values as default values in a configuration file. This file can be adapted by providers with more precise information. For resource policy, we can use Gaia-X default policy "allow: default.

anjastrunk commented 9 months ago

Is was a hard job to figure out default values for right owner and license of all operation systems. And as I'm not a legal expert, I do not know, if I did everything correctly. The default values, available in config/config.yaml, should be reviewed by an expert, definitely.

anjastrunk commented 9 months ago

For VM images, I decided to use the following strategy for default values for mandatory attributes:

<Image Name in Openstack>:
    copyright owner:  <More specific copyright owner"
    resource policy:  <More specific resource policy"
    license:
      -  <More specific license>
anjastrunk commented 9 months ago

Random-number generator device '"hwRngTypeOfImage"

Gaia-X supports the following values for random-number generator: Electrical noise, Chaos-based, Free-running oscillators, Quantum, and None. In contrast to that, Openstack Image metadata allows libvvirt and others as values to specify random number generator devices. OpenStack values and Gaia-X values does not fit. Furthermore, even image prefers a specific random-number generator there is no guarantee, instances will have on, as availability of a random-number generator depends on nova configuration. Hence, discovering random-number generator devices from OpenStack does not create any value added and I skip this property.

anjastrunk commented 9 months ago

Image Encryption

OpenStack does not yet support image encryption. There is a spec regarding encryption, but not feature update yet. We will skip this attribute in generator.

anjastrunk commented 9 months ago

GPU Requirements

OpenStack does not yet support to define GPU requirements, like it is possible for CPU (e.g. architecture, number of cores, number of thread, ...) . Hence, generator will skip generation of GPU requirements.

anjastrunk commented 8 months ago

JSON-LD Serialization

GX Credentials are serialized in JSON-LD. However, there is a challenge in serialization of Python objects in JSON-LD. Build in method json.dumps() from python library json serializes python objects in JSON, only. JSON-LD serialization is not supported. One major difference between JSON and JSON-LD is the usage of data types and URIs of objects and attributes. Both are not included in JSON serialization by default, but essential for linked data (GX credentials are linked data instances), in order to know, which rules of Gaia-X Credential schema should be applied on a given instance.

See, e.g. the following instance of a VM Image:


classDiagram 

class VMImage{
    copyrightOwnedBy=["TBA"]
    license = ["https://license.de"]
    resourcePolicy: ["default: allow intent"]

}

JSON Serialization

{
  "copyrightOwnedBy": [
    "TBA"
  ],
  "license": [
     "https://tba.de"
  ],
  "resourcePolicy": [
    "default: allow intent"
  ]
}

JSON-LD serialization


{
  "@type": [
    "http://w3id.org/gaia-x/gx-trust-framework/VMImage"
  ],
  "http://w3id.org/gaia-x/gx-trust-framework/copyrightOwnedBy": [
    {
      "@value": "TBA"
    }
  ],
  "http://w3id.org/gaia-x/gx-trust-framework/license": [
    {
      "@type": "http://www.w3.org/2001/XMLSchema#anyURI",
      "@value": "https://tba.de"
    }
  ],
  "http://w3id.org/gaia-x/gx-trust-framework/resourcePolicy": [
    {
      "@value": "default: allow intent"
    }
  ]
}

json.dumps() has an optional argument called default, which takes a special call back method with details on how to serialize python objects into JSON. To support JSON-LD serialization, we have to write our own callback method. I did this in to_json_ld in generator.common.json_ld.py, which implements JSON-LD serialization.

anjastrunk commented 8 months ago

Data type of many attributes in GX Credential Schema are unions of types. GX Credential Schema is described with linkML. LinkML defines attribute's data type with key word range. To define union, range is skipped and key word any_of is used. Attribute lisence e.g. in class VirtualResource uses any_of to define data type as union of URI and a fixed set of SPDX identifiers (as enumeration):

class VirtualResource:
  ...
  attributes:
     license:
            required: true
            multivalued: true
            description: A list of SPDX identifiers or URL to document.
            any_of:
              - range: SPDX
              - range: uri
     ...

We use LinkMLs python generator to convert yaml files into python classes. However, LinkML's python generator does not evaluate key word any_of. All properties defined as unions, via key word any_of are mapped to strings. E.g. Attribute license is in class VirtualResouce is mapped to Union[str, List[str]] (Union because, license maximum cardinality is unlimited). Hence, we loose data type information here, which cause problems in JSON-LD serialization. As SHACL (= compliance rules for GX Credenial instance) expect license to be URI or SPDX identifier, all credentials generated by SCS GX Credential generator will fail. The correct data type of license would be Union[Union[str, URI]], List[Union[str, URI]]]

Missing consideration of keyword any_of is a bug in linkML. We can build a workaround in SCS generator, by e.g. hard coding generation of data types in JSON-LD serialization of attributes (see method to_json_ld() in generator.common.json_ld.py, whose data type is defined as union. IMO, this solution will not scale. We need to manually change implementation of to_json_ld every time, attributes data type changes.That's why I decided to fix the bug in linkML directly. Its just a few lines of code.1

anjastrunk commented 8 months ago

I have to correct myself. Looking deeper in to linkML's source code and playing around, I figured out: Supporting keyword any_of is not just a few lines of code, it is a more extensive task. Beside changing data type of attribute to union, class's constructor has to be adapted as well. Constructor initializes object's attributes and cast values it strings in case attribute's range is defined with any_of.

I decided to go with a simple workaround and wait for upstream to fix the bug. Bug report was created, see https://github.com/linkml/linkml/issues/1813.

I figured out, that data types of object's properties are checked at initialization time only. You can change object's properties afterward to arbitrary types.I used this bug(?)/feature to set property's type explicit and evaluate this type in to_json_ld() later on.

anjastrunk commented 8 months ago

Supporting property aggregationOfResources

Gaia-X support to describe VM images in more detail by property aggregationOfResources. This attribute refers a list Gaia-X resources describing resources, VM image is based on. Gaia-X identifies entities via DID. Hence, aggregationOfResources contains a list of DIDs. As Openstack does not support this information, I outsourced the aggregation of resources in GX Credential Generartor's configuration file.

anjastrunk commented 8 months ago

Extending of Gaia-X Credential Schema

To support all properties defined in SCS Image Metadata Standard, two minor changes ind current Gaia-X Credential Schema was necessary. See the following MR in Gaia-X Service Characteristics GitLab