Milestone 1: Article (Glossary)

mlesmenio commented 1 year ago

Clear definition for complex terms that have not been well defined so far:

Federation
Sovereignty
Data Format
...

mcalligator commented 1 year ago

I strongly agree that clearer definitions for these terms are required. That for Sovereignty will help make clear the meaning of the catchphrase "Connected, but sovereign" for Federation.

mcalligator commented 1 year ago

Addressing first the definition of Federation, the Write-up for the Federated Timesheets project established some important groundwork (also captured in this Gitter thread). Key elements of (and omissions from) this were:

Collections of systems considered to be Federated are in a "club".
Qualification for a system to be a member of the club includes the abilities to: (a) Request and process data from other systems (whether federated with it, or merely connected) in the native format of those other systems; (b) Write data to other systems (again, whether federated or not), once more in the native format of the targets; (c) Act as an intermediary between other systems (federated or not) for data exchange using a gossip protocol; (d) Tolerate divergent views of the truth from other federated system, for business reasons; and (e) Respect the integrity and access controls applied to data in their "home" systems.
Club members are hence federation-aware, and require adaptation to support the above functions.
Federation has been implicitly assumed to refer to systems, rather than data, and is hence distinct from federated database systems as described in this Wikipedia article.
Systems federation in this context has not considered the question of distributed query.
Implications of message routing between systems connected via intermediaries was only considered in brief.
Hohpe and Woolf's prior work on Enterprise Integration Patterns should be taken into account for its bearing upon Systems Federation.

mcalligator commented 1 year ago

Moving on to Sovereignty, the Timesheets project Write-up captures the following aspects:

The integrity of the data originating in each system is sacrosanct, and not subject to change by other systems; and
Therefore, data pertaining to the same real-world entities that originate in different systems may diverge for business reasons.

This definition needs expansion to increase its usefulness.

michielbdejong commented 1 year ago

Re sovereignty, I agree with the two points; they go together and are important, e.g. when you receive and store an invoice from another company, you may or may not dispute the information in it. Most database systems are unable to store "statements" like this, they are usually designed to store one single truth. It almost immediately makes the technical requirements for federated bookkeeping systems special.

I would also like to add technical sovereignty, at three levels:

each node is allowed to choose its internal representation format and software stack
each node is allowed to choose which data formats it exposes for export and accepts for import
each node is allowed to choose which interaction protocols it supports
each link in the network can have data formats and interaction protocols that only the (two) neighbouring nodes determine

We have recently also been talking more about regionality, for instance a CRDT can span 10 nodes, that's a region in the global federated bookkeeping network.

michielbdejong commented 1 year ago

Re Federation, I think we can do better; I see two problems with the definition you summarised: 1) it is not convex, so to speak; we are now defining it as "any systems that are able to share state with each other and that are not merely integrated". If we define federated bookkeeping systems simply as "any systems that are able to share state with each other" and allow integrated bookkeeping systems as a (trivial) subset then I think our definition is easier to use. 2) there are some requirements that talk about how data sharing is achieved (for instance the requirements about which native format is used when), rather than about the functionality of it; this in itself sort of violates sovereignty of the systems in the federation.

3) re "distinct from federated database systems", maybe even that's a technology choice as well. A federated database system transfers the data just-in-time when it's needed to provide a query result. I think this is not a fundamentally wrong sync strategy. If the other database is trusted enough, it can be linked to (possibly adding a cache) rather than copying every piece of data to each node.

To me, the main point of Federated Bookkeeping is providing the ability to "be in touch", to have data at one's fingertips while only interacting with one node of a network. Instead of making the user hop around between systems, copy-pasting pieces of data by hand, the systems talk to each other, the data moves to where it needs to be, and the user stays in one place.

So a hierarchy could be:

not federated at all: as a user I need to all the leg work, including maybe even reading things from paper and typing those into a keyboard. I remember early computer magazines that had hex codes. Sadly, much of IT has still not moved very far beyond this, and that's what we're trying to fix.
export to a zip or csv file: for instance Google Takeout or bank statement CSV downloads. The interaction protocol necessarily requires user clicks and also probably the user will need to write or find some script or spreadsheet macro to do the translation between data formats.
OAuth integrations: the user clicks to consent with the systems connecting, but after that, the rest is machine-readable. Still, this is usually one-way sync.
Two-way sync, for instance
- using a personal data store like what PDS Interop is working on
- using a CRDT system like m-ld
- using a connector system like CYB

And maybe on a separate dimension: multi-hop. Endless point-to-point integrations don't scale, so we can almost mathematically prove that multi-hop data transfers will have to be used somehow.

michielbdejong commented 1 year ago

@mcalligator can you add a definition of "the federation protocol"?

michielbdejong commented 1 year ago

I'll volunteer a definition of "the world ledger":

Payments, timesheet entries, and tasks that get completed all represent a movement of value from one economic entity to another. For payments (money transfers) this is obvious. For timesheets, each entry represent a movement of "x hours worth of work" from a worker to a project. For a tasks, it's slightly less pronounced, but still, work is organised through issue trackers, and opening an issue in a tracker often represents a purchase order from one economic entity to another; closing an issue often represents a completion notice in the opposite direction.

Together, the union of database nodes that contain money transfer, timesheet, and issue tracker data, can be seen as one big data set ("the world ledger"), each database node showing only a partial view ("local ledger") of the total value transfer graph.

This has implications for access control, because the sender of value (the payer / the worker / the issue closer) often has an interest in obtaining a verifiable proof that they sent the value. And the value receiver often has an interest that no transmission of value to them is recorded in any node of record without their approval.

Many issue trackers will for instance allow many people to comment on an issue, but only project maintainers are allowed to mark an issue as completed. Otherwise, issues might disappear from the tracker for the wrong reasons, leading to possible catastrophe and a corrupt state of the data in the issue tracker.

Note that local ledgers can have 3 kinds of transfers: incoming, local, and outgoing. If there is an incoming transfer recorded in node A, then there must be a corresponding outgoing transfer in node B, and vice versa. See https://michielbdejong.com/blog/20.html for some early Federated Bookkeeping research that focuses on this topic.

Node A might be "Alice's account on GitHub", where GitHub is trusted to implement access control on Alice's behalf. So node A and node B might be hosted on the same server, but still both be considered sovereign.

mcalligator commented 1 year ago

@mcalligator can you add a definition of "the federation protocol"?

This is one of the required outputs for Milestone 1, the need for which I've now captured on the relevant page of this project's Wiki.

mcalligator commented 1 year ago

@michielbdejong wrote:

I see two problems with the definition you summarised:

it is not convex, so to speak

In response, I should clarify that the above summary wasn't intended as a definition of Federation in its own right, but more as a precis of the understanding we reached jointly at the conclusion of the Federated Timesheets project,

Also, I don't think I understand the meaning of the term 'convex' in this context; could you explain?

mcalligator commented 1 year ago

@michielbdejong wrote:

requirements that talk about how data sharing is achieved (for instance the requirements about which native format is used when), rather than about the functionality of it; this in itself sort of violates sovereignty of the systems in the federation.

I'm not sure I agree: rather, violation of a given system's sovereignty would occur when another mandates receiving data only in a format that it understands, or insists on sending data to other systems in its own format, rather than being willing to play ball and meet other systems where they are, by using their formats. Systems that aren't prepared to do that would therefore be classed as Integrated, rather than Federated. They may well be able to exchange data with other systems via an intermediary such as Connect Your Books, but the latter would be acting as an Integration Engine (taking responsibility for format conversion, distributed transaction management, and potentially identity and identifier mapping too). Such systems would be just Integrated with CYB. Federation entails a degree of both awareness of Federation, and of mutuality, in my view, which those systems wouldn't be exhibiting.

If the overall outcome was that someone could access the information they needed from a system of their choosing, then it's true that Federation would have been accomplished, albeit through the use of the conventional hub-and-spoke architecture of an Integration Engine in the middle, rather than those systems taking on some of the work of connecting. However, I would then argue that the project would have achieved nothing new, since this is a long-established pattern.

Another issue with the approach of leaving all inter-system connectivity to an Integration Engine is that one of the unique aspects of Federation that you mentioned - the ability to tolerate multiple versions of the truth - would not be achieved, since the Integrated systems would not be respecting this requirement.

michielbdejong commented 2 weeks ago

Right! The system may still be sovereign, but voluntarily decide to 'play ball' as you say, that's true.

Closing this issue since Manuel finished his article over a year ago and we have now claimed the completion of Manuel's, Navid's, and my part of milestone 1a.

@mcalligator Note that milestone 1b (the federation protocol) is still outstanding, and assigned to you, right? And you also still have some unclaimed tasks for milestone 1a? There are 4 weeks left (including the current one) to complete it before the project end deadline - see the Matrix chat.

federatedbookkeeping / task-tracking

Milestone 1: Article (Glossary) #9