ITISFoundation / osparc-simcore

🐼 osparc-simcore simulation framework
https://osparc.io
MIT License
46 stars 27 forks source link

Migration between deployments/Export project functionality #5824

Open matusdrobuliak66 opened 5 months ago

matusdrobuliak66 commented 5 months ago

Based on the working group https://github.com/ITISFoundation/osparc-ops-environments/issues/672 we decided we will investigate these 3 options:

440871009_3694197630909287_7269496293591851454_n

(1) Importing from target deployment

Using an ad-hoc GUI the user can import thier projects from another deployment.

Prerequisits:

Chnages to oSPARC:

PROS:

CONS:

(2) Archiving

Generate an archive containing project data and data stored in all nodes.

Prerequisits:

Changes to oSPARC:

PROS:

CONS:

(3) Migration

The idea here is to migrate one deployment to another.

PROS:

CONS:

### Tasks
pcrespov commented 1 month ago

Brainstorming on Sep.27, 2024

There was no consensus on a clear preference for any of the proposed solutions above. Below some notes from the discussion

Data Migration from Source to Destination Database

When migrating data between databases, especially PostgreSQL tables with identifiers and relationships, it’s important to go beyond just viewing it as a transfer of data rows. The semantics of the data (i.e., the meaning of the entities and their relationships) must also be considered. Still, some of the key challenges can already be identified, particularly around merging data that exists in both the source and destination databases:

Key Challenges:

  1. Integer Identifiers:

    • Apply an offset to the source table IDs by adding the maximum ID value from the destination table to avoid conflicts.
    • While it’s not mandatory, switching to more unique, descriptive identifiers (similar to Stripe-like IDs such as name_1456123456asdfa45) would be preferable.
  2. Merging Existing Resources (e.g., Users, Products):

    • Users: Handle records where users have the same email address in both source and destination databases.
    • Products: Manage cases where products share the same product name across both databases.
    • Group 1: Identify and handle additional resource overlaps.
  3. Maintaining Dependencies (e.g., Groups):

    • To preserve data integrity, ensure that related records (e.g., groups) are inserted in the correct order during migration. This guarantees that dependencies are maintained.

A Semantic Approach to Migration

Considering the database's structure and meaning, a more strategic approach is to break the migration into stages based on different contexts. This allows for grouping related tables and migrating them together, either manually or automatically.

Identified Contexts:

  1. Platform Configurations:

    • Clusters
    • Products
    • Product Prices
    • (...)
  2. Users:

    • Users
    • Wallets
    • User Preferences (Frontend)
    • (Additional user-related tables)
  3. Services:

    • Service Metadata
    • Service Access Rights
    • (Additional service-related tables)
  4. Studies (Projects + Data):

    • Projects
    • Folders
    • File Metadata
    • (...)

Migration Process Requirements

  1. Data Integrity Checks:

    • Every step of the migration process must include validation checks to ensure data integrity, preventing corruption or data loss.
  2. Checkpoints for Rollback:

    • Implement checkpoints at various stages of the migration to allow for reversion in case a data integrity check fails, ensuring a safe fallback.

Features

Even thought his process will be mostly carried out once and in the backend, it might have a big value if the ability to import/export studies should be available as a standalone feature for users