Propose Solution for Efficient Sync with Many Delete Entries

Is your feature request related to a problem? Please describe.

The current synchronization process for atServer is facing inefficiencies, particularly when handling large numbers of deletions. This impacts the performance and scalability of the system. This ticket's objective is to propose a solution to improve synchronization efficiency in scenarios involving numerous delete entries, ensuring the solution is scalable and independent of Hive.

Current Design Overview:

CRUD Operations:

Data Storage: atServer stores data as key-value pairs. Key Management: Keys can be created, deleted, or automatically expired using the ttl (time to live) parameter. Expired Key Cleanup: A cron job deletes expired keys. Key Storage: All keys are stored in a Hive box named KeyStore.

Commit Log:

Operation Logging: Key creation or updates are logged in a Hive box called CommitLog with an auto-generated sequence number. Recording Changes: Each operation is recorded with a new sequence number. Single Entry per Key: The CommitLog maintains one entry per unique key.

In-Memory Compact CommitLog:

In-Memory Representation: atServer keeps an in-memory map of the CommitLog to optimize synchronization. Sync Efficiency: This map supports efficient synchronization operations.

Sync Process:

Client Connections: Multiple clients can be connected to an atServer. Data Synchronization: Clients sync data with the atServer, which assigns a commit ID. Clients record this ID locally. Sync Status: A data item with a server commit ID indicates it is synced. Managing Sync Differences: Clients must update their local commit ID before pushing new data if their ID is lower than the server's latest ID.

Current Design Issues:

Inefficient Sync with Many Deletions:

Excessive Syncing of Deleted Keys: New clients must sync all keys, including numerous deletions, leading to significant time and space inefficiencies.
Impact on Sync Performance: Syncing a large number of deletions consumes bandwidth and processing resources, reducing overall efficiency.
Inefficiencies in Key Expiry: Clients that created expired keys also sync deletions, even though they could manage these locally.

Describe the solution you'd like

Propose a Solution for Efficient Sync with Many Delete Entries: Scalable and Hive-Agnostic

Describe alternatives you've considered

No response

Additional context

No response

The ultimate goal of any optimizations to the sync process is to enable a client to create data and make it available to the server in the shortest possible time, while also reducing both time and storage requirements by syncing fewer entries. Clients may have varying synchronization requirements, which can be addressed through the following strategies:

1. Clients Concerned Only with New State Example: An SSHNP client, which is only concerned with new, future SSH sessions and the keys associated with them.

Behavior: Such a client doesn’t need to be aware of any previous state. It should have the flexibility to either operate independently of the server's current state or sync with the server without retrieving any existing keys.

Benefit: This approach allows the client to create data that is immediately available to the server without the overhead of syncing old or irrelevant data.

2. Initial Sync vs. Delta Sync Initial Sync: During an initial sync, the server can skip sending deleted entries to the client, allowing the client to quickly synchronize with the server.

Delta Sync: After a client has performed an initial sync, it can use delta syncs to receive only changes (additions, updates, deletions) since the last sync.

Sync Type Flag: Introducing a syncType: initial|delta flag in the sync operation would enable the server to optimize syncs by not sending deleted entries during the initial sync. This would help the client get in sync faster, allowing it to start creating data that is immediately available to the server.

Edge Case: If the last entry in the sync process is a deletion, the client might fall out of sync by one entry. This scenario should be managed gracefully to ensure synchronization integrity.

3. Skipping Expired Keys During the Sync Flag: The skipSyncEntriesOfExpiredKeys: true|false flag would instruct the server not to send sync entries for expired keys during the synchronization process. This is useful when the client can handle the deletion of expired keys locally.

Edge Case: As with deletions, if the last entry to sync is a deletion of an expired key, the client may become out of sync by one entry. Handling this scenario is crucial to maintaining consistent synchronization.

Technical Implementation To technically implement these optimizations, the following capabilities are needed:

A client should have the ability to be marked as in sync, without it synchronizing any data from the server.
Clients should be able to locally delete expired keys and set skipSyncEntriesOfExpiredKeys to true during synchronization.
Clients should have the ability to set and toggle the syncType between initial and delta to inform the server of the sync phase.

A client can have four types of synchronization requirements:

Always-Online Clients: These clients need to be continuously connected to the remote secondary for their operations. Their primary focus is on reading from and writing to the server directly, without relying on cached data. These clients do not require synchronization to access cached data as they do not depend on it.

Ex: SSHNoports code has completely disable the sync and put and get talks to remote secondary directly

1. Snippet from SSHNoPorts code where we are creating an atClient with NoOp sync service

atClientGenerator: (SshnpdParams p) => createAtClientCli( atsign: p.deviceAtsign, atKeysFilePath: p.atKeysFilePath, rootDomain: p.rootDomain, storagePath: p.storagePath, namespace: DefaultArgs.namespace, atServiceFactory: ServiceFactoryWithNoOpSyncService(), ),

2. Get and Put request options to bypass local secondary and write to remote secondary directly

/// Parameters that application code can optionally provide when calling /// AtClient.get class GetRequestOptions extends RequestOptions { /// Whether the get request should bypass this atSign's cache of data owned /// by another atSign bool bypassCache = false; }

/// Parameters that application code can optionally provide when calling /// AtClient.put class PutRequestOptions extends RequestOptions { /// Whether to set the sharedKeyEnc and pubKeyCS properties on the /// Metadata for this put request bool storeSharedKeyEncryptedMetadata = true;

/// Whether to send this update request directly to the remote atServer bool useRemoteAtServer = false; }

3. put with request options

await atClient.put(key, params.toJson(), putRequestOptions: options);

Push-Only Clients: These clients are only concerned with sending new data to the server and do not care about any previous state. If they go offline, they can queue requests and push them to the server when they reconnect.

Full Sync Clients: These clients need to fetch data from the server before performing any operations. Applications like Buzz and Wavi typically fall into this category. The default sync behavior in the atClient SDK is designed to cater to these clients.

Sync Requirement: These clients must complete synchronization before they can push any new data to the server.

Selective Sync Clients: These clients require only a specific subset of data from the server before creating new data and syncing it back. For example, if a client starts with no data and only needs keys key1, key2, and key5 from the server, it will sync those keys and then proceed to interact with the server using that limited dataset.

When developing an application, it’s crucial to understand your client's sync requirements, as each type can significantly impact performance.

Actionable Next Steps Based on the Analysis:

Enable the Feature to Exclude Commit Log Entries for Expired Key Deletions on the Server:

Implement the ability for the server to skip adding delete entries for expired keys when all clients have the capability to delete their local expired keys.
Address the current defect where previous update entries of expired keys are not being deleted as expected.

Introduce the Sync Type Flag:

Part 1: Introduce a syncType: initial|delta flag in the sync syntax, with delta being the default value on the server side. Begin by implementing server-side support for this flag.
Part 2: Develop the intelligence in the SDK to pass the appropriate sync type value. Clearly define what constitutes an initial sync versus a delta sync, ensure the correct value is passed, and develop tests to validate this behavior.
Part 3: Implement server-side logic to prevent sending deleted entries when the syncType is set to initial.

Analyze Sync Issues with atcolin:

Step 1: Observe and measure sync time during a full initial sync, and analyze the nature of the data being synced (this could be on any platform).
Step 2: Investigate and debug any platform-specific issues, such as those identified on Windows.

Conduct an Architectural Discussion to Evaluate the Need for Direct Support in SDK for:

Always-Online Clients: Review the work done in SSH no ports for these types of clients.
Push-Only Clients: Discuss the requirements and implications.
Selective Sync Clients: Analyze and plan support for clients that need to sync only specific subsets of data.

Analyze Sync Issues with atcolin. @purnimavenkatasubbu can you start on this, please?

atsign-foundation / at_server