Closed johnhbenetech closed 3 years ago
Contents:
Whenever possible we shouldn't enforce centralized architecture:
What does it imply in our case?
In this context data processing means match detection between local and external fingerprints and repository access means pushing and pulling fingerprints to/from external repositories. Assuming data-processing is offloaded to the clients we should support a workflow that will allow to split data processing and repository access:
Our customers use different workflows (e.g. with and without webui+database). In case of webui+database
workflow we should make sure that all the source data and all processing results are available in local database if we want to display them via frontend.
Because of security considerations we should not require from highly secured nodes (with access to the sensitive data) to have internet access in order to push/pull fingerprints to/from external repositories:
We should support an offline-only workflow in which the application is not required to have internet access in order to push/pull fingerprints.
This could be done by packaging fingerprints:
This may work as follows:
We cannot use any other information to reliably link local files to remote repository entries:
We should keep in mind that if clients may read all the fingerprints from remote repositories they may abuse this information:
As we have just a handful of trusted customers (interested in long-term collaboration) we can simply accept this risk.
Synchronization between local data and remote repository could be achieved as follows:
A remote repository entry may look like this:
Contributor ID | Fingerprint | Hash | Serial ID (int) |
---|
There are different ways in which the remote repository could be implemented.
This is a "pessimistic" approach to organize a repository:
Pros:
Cons:
This is a "realistic" approach:
Pros:
Here is an "optimistic" approach:
Pros:
Cons:
We need to update local database schema to keep track of remote data source.
There are at least two approaches:
Re-using existing types:
Pros:
Cons:
Explicit distinction between local and remote files:
Pros:
Cons:
We will need to create a command line tool(s) and UI elements for the following operations:
There are two big parts of required implementation efforts:
As we discussed on the last meeting we will initially go for the Bare Database option, and then probably (under favorable circumstances) implement the Simple Repository variant (see the previous comment).
Implementation effort required for the Bare Database variant:
sha256
)insert
and select
but not update
privileges for all contributors.select
permission for others. @johnhbenetech My strong concern here is that this approach may easily turn into GIGO (if we won't be very cautious).
As it is not clear if we'll go for Simple Repository option, we can just skip its details for now.
To fully support remote fingerprint matching we need at least the following features:
These features must be available both as a command-line tool(s) and as part of Web Fronted. Note that only items (2) and partially (5) depend on the remote repository implementation.
Also the above feature list provides a rough idea of how the required effort could be split into separate tasks.
The command line tool could be organized a hierarchical script with subcommands.
@johnhbenetech please review the above list. For UI part I can draw some inspiration from existing mockups (like this one). But any detailed mockups/suggestions from you will be extremely helpful. For now I'll just outline required tasks.
This task is to begin tracking design decisions for implementation of the 'shared fingerprint' database and related interactions.
User stories for JusticeAI user:
Benetech requirements:
User stories for JusticeAI user receiving results:
Considerations: