SwissDataScienceCenter / renku-graph

renku-graph
https://renku.readthedocs.io/en/latest/reference/services/graph-services.html?highlight=graph#graph-services
Apache License 2.0
10 stars 2 forks source link
renku

[pullreminders](https://pullreminders.com?ref=badge)

renku-graph

Repository structure

Running the tests

sbt clean test && sbt "project acceptance-tests" test

Depending on your global configuration of sbt you have installed, you might need to set SBT_OPTS to avoid OutOfMemory exception. If such error is raised, try setting the variable with the following:

export SBT_OPTS="-Xmx2G -Xss5M"

Development

Coding molds

Renk Graph was built with code readability and maintainability as a value. We believe that high coding standards can:

Hence, we are trying to find and then follow good patterns in naming, code organization on a method, class, package and module level. The following list has a work-in-progress style is it supposed to be in constant improvement.

Releasing

The standard release process is done manually. There are multiple repositories taking part in the process. The renku project contains helm charts for deploying to kubernetes and the acceptance tests. The terraform-renku project contains deployment descriptions for all environments.

renku-graph project:

renku project:

terraform-renku:

Cleanup (renku-graph):

Hotfixes

In a case of hotfixes, changes to a relevant commit/tag needs to be done and pushed to a special branch with name following the hotfix-<major>.<minor> pattern. Once the fix is pushed, CI will test the change with other Renku services. Tagging has to be done manually.

Event Flow

This section describes the flow of events starting from a commit on GitLab until the data is stored in the triples store. The solid lines represent an event being sent and the dotted lines represent non-event-like data (request or response).

Opting in a Project into KG

The assumption is that the Project already exists in GitLab.

sequenceDiagram
    participant UI
    participant WebhookService
    participant GitLab
    participant TokenRepository
    participant EventLog

    UI ->> WebhookService: POST /projects/:id/webhooks
    activate WebhookService
    WebhookService ->> GitLab: Create a KG webhook     
    WebhookService ->> TokenRepository: PUT /projects/:id/tokens
    WebhookService ->> EventLog: sends COMMIT_SYNC_REQUEST
    WebhookService ->> UI: 200/201
    deactivate WebhookService

A new commit flow

The assumption is that there's Renku Webhook for a Project created and GitLab sends a Push Event for the project.

sequenceDiagram
    participant GitLab
    participant WebhookService
    participant EventLog

    GitLab ->> WebhookService: POST /webhooks/events
    WebhookService ->> EventLog: sends COMMIT_SYNC_REQUEST 

Commit Sync flow

This flow traverses the commit history for a Project in GitLab until it finds a commit EventLog knows about.

sequenceDiagram
    participant EventLog
    participant CommitEventService
    participant CommitEventService
    participant GitLab

    EventLog ->> CommitEventService: sends COMMIT_SYNC 
    activate CommitEventService
    CommitEventService ->> TokenRepository: fetches access token
    CommitEventService ->> GitLab: finds commits which are not in EventLog
    CommitEventService ->> EventLog: sends CREATION for all commits that are not in EventLog
    CommitEventService ->> EventLog: sends EVENTS_STATUS_CHANGE (to: AWAITING_DELETION) for all commits that are in EventLog but not in GitLab
    CommitEventService ->> EventLog: sends GLOBAL_COMMIT_SYNC_REQUEST if at least one AWAITING_DELETION or CREATION was found 
    deactivate CommitEventService

Global Commit Sync flow:

This flow traverses the whole commit history of a Project and find out:

  1. if there are commits on GitLab that need to be created on the Eventlog
  2. if there are commits that are not on GitLab that should be removed from the EventLog

This process is scheduled to be triggered at a minimum rate of once per week per project and at a maximum rate of once per hour per project. The commit history traversal only begins when the number of commits on GitLab and on the EventLog does not match and the most recent commit on GitLab is different from the most recent commit on the EventLog.

sequenceDiagram
    participant EventLog
    participant CommmitEventService
    participant CommitEventService
    participant GitLab

    EventLog ->> CommmitEventService: GLOBAL_COMMIT_SYNC
    activate CommitEventService
    CommitEventService ->> GitLab: finds out the last commit ID and the total number of commits
    loop if the last commit ID or the total number of commits do not match with EventLog state find all the differences
    CommitEventService ->> TokenRepository: fetches access token
    CommitEventService ->> GitLab: get all commits
    CommitEventService ->> EventLog: get all commits
    CommitEventService ->> EventLog: sends CREATION for all commits that are not in EventLog
    CommitEventService ->> EventLog: sends EVENTS_STATUS_CHANGE (to: AWAITING_DELETION) for all commits that are in EventLog but not in GitLab
    end
    deactivate CommitEventService

Project provisioning flow

The assumption is the latest Commit Event for a Project in EventLog is in status 'NEW'

sequenceDiagram
    participant EventLog
    participant TriplesGenerator
    participant TokenRepository
    participant GitLab
    participant CLI
    participant TriplesStore

    EventLog ->> TriplesGenerator: sends AWAITING_GENERATION
    activate TriplesGenerator
    TriplesGenerator ->> TokenRepository: fetches access token
    TriplesGenerator ->> GitLab: clones the project
    TriplesGenerator ->> CLI: renku migrate
    TriplesGenerator ->> CLI: renku graph export
    TriplesGenerator ->> EventLog: sends EVENTS_STATUS_CHANGE (to: TRIPLES_GENERATED) with the graph as payload
    deactivate TriplesGenerator

    EventLog ->> TriplesGenerator: sends TRIPLES_GENERATED
    activate TriplesGenerator
    TriplesGenerator ->> TokenRepository: fetches access token
    TriplesGenerator ->> GitLab: calls several APIs in the Transformation process
    TriplesGenerator ->> TriplesStore: execute update queries and uploads project metadata
    TriplesGenerator ->> EventLog: sends EVENTS_STATUS_CHANGE (to: TRIPLES_STORE)
    deactivate TriplesGenerator    

Commit deletion flow

The assumption is that there was a git reset hard or git rebase done on the Project

sequenceDiagram
    participant EventLog
    participant TriplesGenerator
    participant TokenRepository
    participant GitLab
    participant TriplesStore

    EventLog ->> TriplesGenerator: sends CLEAN_UP_REQUEST
    activate TriplesGenerator
    TriplesGenerator ->> TokenRepository: fetches access token
    TriplesGenerator ->> TriplesStore: remove the data of a Project
    TriplesGenerator ->> EventLog: sends EVENTS_STATUS_CHANGE (to: NEW) of all the event of a single Project
    deactivate TriplesGenerator

    activate EventLog
    EventLog ->> EventLog: remove all events in status AWAITING_DELETION and DELETING
    loop if there are no events left for the Project
    EventLog ->> EventLog: remove the Project 
    EventLog ->> TokenRepository: remove the Project token
    EventLog ->> GitLab: remove the Project WebHook
    end
    EventLog ->> EventLog: change status of all Project events to NEW
    EventLog ->> TriplesGenerator: sends AWAITING_GENERATION
    deactivate EventLog    

The ADD_MIN_PROJECT_INFO event

The assumption is that there's no Commit Event in TRIPLES_STORE status for a Project

sequenceDiagram
    participant EventLog
    participant TriplesGenerator
    participant TokenRepository
    participant GitLab
    participant TriplesStore

    EventLog ->> TriplesGenerator: sends ADD_MIN_PROJECT_INFO
    activate TriplesGenerator
    TriplesGenerator ->> TokenRepository: fetches access token
    TriplesGenerator ->> GitLab: calls several APIs in the Transformation process
    TriplesGenerator ->> TriplesStore: execute update queries and uploads project metadata
    TriplesGenerator ->> EventLog: sends EVENTS_STATUS_CHANGE (to: TRIPLES_STORE)
    deactivate TriplesGenerator    

The MEMBER_SYNC event

This event is sent periodically to sync authorization data between GitLab and Triples Store

sequenceDiagram
    participant EventLog
    participant TriplesGenerator
    participant TokenRepository
    participant GitLab
    participant TriplesStore

    EventLog ->> TriplesGenerator: sends MEMBER_SYNC
    activate TriplesGenerator
    TriplesGenerator ->> TokenRepository: fetches access token
    TriplesGenerator ->> GitLab: calls the Project users and Project members APIs
    TriplesGenerator ->> TriplesStore: project members
    deactivate TriplesGenerator    

The PROJECT_SYNC event

This event is sent periodically to sync Project data between GitLab, EventLog and Triples Store

sequenceDiagram
    participant EventLog
    participant CommitEventService
    participant TriplesGenerator
    participant TokenRepository
    participant GitLab
    participant TriplesStore

    EventLog ->> EventLog: sends PROJECT_SYNC
    activate EventLog
      EventLog ->> TokenRepository: fetches access token
      EventLog ->> GitLab: calls the Project Details
      loop if the project slug is NOT the same in EventLog and GitLab
        EventLog ->> CommitEventService: sends COMMIT_SYNC for the new slug
        EventLog ->> TriplesGenerator: sends CLEAN_UP_REQUEST for the old slug
      end
      EventLog ->> TriplesGenerator: sends SYNC_REPO_METADATA
      activate TriplesGenerator
        TriplesGenerator ->> GitLab: fetches project metadata
        TriplesGenerator ->> TriplesStore: fetches project metadata
        TriplesGenerator ->> EventLog: fetches the payload of the latest project event
        TriplesGenerator ->> TriplesStore: sends update queries if values needs updating (not for visibility changes)
        TriplesGenerator ->> EventLog: sends RedoProjectTransformation (only when visibility changes)
      deactivate TriplesGenerator
    deactivate EventLog    

The ZOMBIE_CHASING event

This event category detects Commit Events that got stale.

sequenceDiagram
    participant EventLog
    participant TriplesGenerator

    loop finds out events that are marked as under processing but the process was interrupted
    activate EventLog    
    EventLog ->> TriplesGenerator: verifies if instance with given URL and identifier exist
    EventLog ->> EventLog: sends ZOMBIE_CHASING
    deactivate EventLog   
    EventLog ->> EventLog: sends EVENTS_STATUS_CHANGE (to: NEW | TRIPLES_GENERATED)
    end 
The removal (re-provisioning) of a project

Once an event is marked as AwaitingDeletion it is automatically picked up by our process and a CleanUp event is created. This event triggers the removal of the project in the Triple Store. The clean up of a project can be either the removal of the projects with all its events and entities (if the project was removed from GitLab) or the re-provisioning of the project (if there are events which are not AwaitingDeletion).

Removing Project Triples

The removal of project triples happens in two steps:

Updating links happens in order to not create island in our graph. An example would be with a hierarchy of forked projects:

project1 <-- project2 <-- project3

If we wanted to remove project2 we would have to re-link project3 to project1.

project1 <-- project3

The update of the links would also be applied to the Dataset entities which could be imported from other Datasets(similar to a fork for a project).

After the re-linking, the project and all its dependant entities can be removed. These entities will be removed only if they are not used in another project.