Devsh-Graphics-Programming / Nabla

Vulkan, OptiX and CUDA Interoperation Modular Rendering Library and Framework for PC/Linux/Android
http://devsh.eu
Apache License 2.0
448 stars 56 forks source link

CI Python Framework #420

Open AnastaZIuk opened 1 year ago

AnastaZIuk commented 1 year ago

New CI's job handling diagram

graph TD
    Yes1[Yes]
    No1[No]
    TODO
    A[Devsh Jenkins]
    B[Proxmox OS Host]
    C[Jenkins Agent]
    D[Windows 11 Virtual Machine with PCI passtroughed GPU and required inputs]
    E[Temporary directory created by Proxmox OS Host on demand]
    F[Artifactory]
    id1{Is it a job with CI checks for examples?}

    A-- 1. Run job ---C
    B --> C
    C --> B
    B-- "2. Create VM on demand from predefined system image (having everything installed on the system image to handle Nabla)" ---D
    D-- 3. Process a job in Virtual Machine ---id1
    id1 --> Yes1
    id1 --> No1
    No1 --> TODO
    Yes1-- 3a. Clone Nabla<br/>3b. Build Nabla solution<br/>3c. Let Proxmox OS Host validate an example's 'JSON input'<br/>3d. Run an example which takes 'JSON input', parse validated json data that will be used to produce example's results<br/>3e. Let Proxmox OS Host SCP examples' results from the VM to temporary directory created by it ---E
    E-- 4. Let Proxmox OS Host validate copied results from the temporary directory using Python Framework<br/>4a. Produce a HTML file using Python Framework containging outpus resources from JSON input file<br/>4b. Let Proxmox OS Host SCP the output resources and the HTML file to artifactory ---F
    F-- 5. Bring everything to start point<br/>5a. Let Proxmox OS Host remove it's temporary directory<br/>5b. Let Proxmox OS Host remove the entire Virtual Machine image with all data ---B

CI security job procedure description

Above CI's job handling diagram shows new way of the CI proceduring a job. Previous pipeline had many disadvantages and it was susceptible to various types of hacking attacks, such as arbitrary code execution that could damage our nodes or even company infrastructure. The key is to limit room for maneuver, job should be executed in encapsulated environemt that cannot comunicate with outside world. To meet these requirements we have Proxmox virtualizer system that is responsible for creating virtual machines on demand when any job is trigerred by our Jenkins - the job executes in the VM and once it completes the VM system gets removed, It also validates some data (like Input JSON files) that needs to be hanlded by the job, talks to the VM, scans job's results and is a gate between artifactory. Any job handling a Nabla example providing JSON Input data files is expected to produce artifacts as result of executed example and a HTML file generated via Python Framework with listed example's results and any data related.

Detailed steps

A job request is triggered by Github hook, Pull Request comment or external URL with proper credentials. Our Jenkins handles the request and runs the job on appropriate Jenkins Agent which is Proxmox OS Host. Jenkins Agent creates a Virtual Machine with predefined system image (Windows 11 with required software to compile Nabla for all available operating systems and NVIDIA drivers installed. Proxmox OS Host is setup to handle PCI passtrough and our created VMs have GPU and some PCI inputs passtroughed) for the following job requested. Having the VM created Proxmox OS Host begins the communication between the VM and itself to perform the job's steps. The communication is only one-way - Proxmox OS Host connects with VM by SSH protocol. The VM cannot talk to Proxmox OS Host.

Further steps for a job that does anything Nabla related

Proxmox OS Host starts to execute job's instructions being connected to the VM by SSH (note: by "it something", like "it runs a command" we mean Proxmox OS Host doing the thing on the VM by SSH) in following order:

It shallow-clones Nabla and it builds it. If the job needs to handle any of examples then for the following examples it validates their Input JSON files before they get executed.

Once the validation completes (sucessfully or not) then it runs the python example which takes Input JSON as the python example's input. The example uses Python Framework's API to parse json inputs, execute Nabla's example executable (which outputs results according to Input JSON file) and perform any CI checks with provided and outputted result data.

When the python job completes it creates temporary directory with hashed name on it's operating system and copies python example's results from the VM to the newly created directory. Once copied it validates and scans results to detect potential malware and when it's done, it generates HTML file in the temporary directory with artifact resources and begins to validate it.

Once it has HTML validated it uses SCP protocol to transfer all of the python example's results and generated HTML from the temporary directory to company's artifactory. When all of data is transfered it deletes the temporary directory and removes the VM.

Further steps for a job that does anything but not Nabla related

Procedure files

JSON Input file description

The following is an example of JSON input file for a given example, it should be common for all of them

{
  "enableParallelBuild": true,
  "threadsPerBuildProcess" : 2,
  "isExecuted": false,
  "testScript": "relative_path_to_python_script_for_bulk_testing arg1 arg2 arg3",
  "cmake": {
    "configurations": [ "Release", "Debug", "RelWithDebInfo" ],
    "buildModes": [ 
      "build_option_1",
      "build_option_2"
    ],
    "requiredOptions": [ "NBL_BUILD_OPTION_1", "NBL_BUILD_OPTION_2" ]
  }, 
  "profiles": [
    {
      "backend": "vulkan",
      "platform": "windows",
      "buildModes": [ "build_option_1" ],
      "runConfiguration": "Release",
      "gpuArchitectures": [ "Turing", "Pascal" ]
    }
  ],
  "dependencies": [
    "relative_path_to_a_file_1",
    "relative_path_to_a_file_2"
  ],
  "data": [
    {
      "dependencies": [
        "relative_path_to_a_file_3",
        "relative_path_to_a_file_4"
      ],
      "command": "whole bash command",
      "outputs": [
        "absolute_path_to_an_output_created_by_command_1",
        "absolute_path_to_an_output_created_by_command_2"
      ]
    },
    {
      "dependencies": [
        "relative_path_to_a_file_5",
        "relative_path_to_a_file_6"
      ],
     "command": "whole bash command",
      "outputs": [
        "absolute_path_to_an_output_created_by_command_1",
        "absolute_path_to_an_output_created_by_command_2"
      ]
    }
  ]
}

enableParallelBuild is a required boolean variable, determines whether the build process runs for different configurations in parallel.

threadsPerBuildProcess is an optional integer variable, use with enableParallelBuild enabled.

isExecuted is a required boolean variable. if true, the example will be built and tests will run to validate if it works properly. If false, this example will be only checked whether it compiles successfuly.

requiredFiles is an array of common relative file paths to any kind of resources like executables, shadares and similar that are required to be present on the Virtual Machine before CI launches an example that will read from the json file.

data is an object array that consists with data batches needed for a single run invocation given an input of fields described bellow:

HTML output file description

TODO!

HTML result features

TODO!

Python Framework

TODO!

Description

TODO!

API

TODO!

Validation

The validation is performed via API provided by Python Framework. If validation fails on any step bellow then a particular job will continue it's job anyway but with additional restrictions:

Validation of JSON input file

Includes:

Validation of produced results from a given example

Includes:

Validation of generated HTML file

Includes:

AnastaZIuk commented 1 year ago

once we have time to finish the issue and begin the work, we may consider golang and proxmox-api-go