byu-dnasc / proto-smrtlink-share

0 stars 1 forks source link

Handling faults and de-synchronization #34

Open adknaupp opened 5 months ago

adknaupp commented 5 months ago

Startup

The app's job is to stage project files and grant Globus users access to these files. Whether it's an initial deployment, or a restart following a failure of some kind, most likely, the project files and Globus access rules which have persisted (if any) will require some updates to be in-sync with SMRT Link. Luckily, all the information we need to operate the app is renewable, i.e. no matter what was in the app's database before it shut down (or if it didn't even exist), all that really matters is stored by SMRT Link and Globus. However, the state of the project files and that of the Globus access rules will still be taken into account upon starting the app.

1. Renew the project tables

When the app is started up, the database file (if it exists) should be completely wiped (including each table). The first tables to be repopulated will be the project tables, i.e. Project, ProjectDataset, ProjectMember.

Project files

Next, we need to check that all project files are staged. To avoid a large number of unnecessary file operations, any necessary updates should be performed as needed by firstly checking which project files are already in place, and thus which ones must be created, updated, or deleted.

This process must somehow integrate the repopulating of the DatasetDirectory table.

Project members

Also, projects may have had members added or removed, so the set of access rules associated with the Globus collection may need to be updated. Again, this should involve comparing the set of access rules created by the app prior to startup with the latest set of project members. Only access rules created by the app may be removed. See issue #33 for instructions on how to identify the access rules created by the app.

adknaupp commented 5 months ago

Synchronization

The essential definition of 'de-synchronization' in the app is the result of a project request going to SMRT Link without the app receiving that same request. Recall that thanks to smrtlink-proxy, the app gets CC'ed on all project requests sent to SMRT Link via the proxy (NGINX calls this request 'mirroring'). However, there are scenarios (mostly ones where something is not properly configured, or otherwise out of order) where a request could make it to SMRT Link, but not to the app. A simple example of this would be if someone were logging into and using SMRT Link directly (i.e. on port 8243 and not through the proxy, which at the time of writing is set to run on port 8244).

Fault cases

De-synchronization can occur any time while SMRT Link projects are being modified and the app is somehow incapacitated (or of course, if the app is not running). One implication of this is that if SMRT Link is down, then there is no possibility of de-synchronization since projects can't be modified. Otherwise, as long as SMRT Link is up, possible sources of de-synchronization are as follows:

Fault and de-synchronization tolerance

The app must be resilient to any possible exception which may arise in the course of its execution. In some cases, we may be able to implement conditional behavior to recover from an exception. Otherwise, the app must divide tasks in a such a way that when a fault occurs in the process of completing one task, other tasks can still be successful (when appropriate).  

Using the OutOfSyncError to handle de-synchronization

The app is designed to be able to handle de-synchronization to a certain extent. That is, it will always be able to respond to actions performed on individual projects. Updating a project and adding a new project will always succeed. However, if there were changes made to projects while app was not functioning normally,