federatedbookkeeping / research

Research notes about Federated Bookkeeping and related topics
https://federatedbookkeeping.org
MIT License
7 stars 2 forks source link

Contrast "federation-on-read" (distributed queries) with "distribution-on-write" (gossip) #36

Open michielbdejong opened 1 year ago

michielbdejong commented 1 year ago

Split out from https://github.com/federatedbookkeeping/research/issues/34#issuecomment-1369731027

michielbdejong commented 1 year ago

On a slightly more abstract level, we can conceptually separate the movement of data from the translation of data. This is related to CLOGS and Ink&Switch Project Cambria (not to be confused with the Meta project of the same name).

It's like the difference between procedural programming (describing what should be done) and functional programming (describing how an output depends on inputs).

michielbdejong commented 1 year ago

Sometimes data needs to be translated before it can be moved. For instance, you can't move data from GitHub Issues to Jira with converting it from GitHub's format to Jira's format. But a connector like CYB could keep a log of the original source format so that if the translation algorithm is updated, or a new view or report is added, it can be regenerated from the original data.

The use of digital signatures is probably also only possible if the data's representation at the time of signing is preserved.

It wouldn't make sense to sign individual words in a document, you can only meaningfully sign a full message in its original context, and so any translation or rehash of that data would take the signature out of its context.

michielbdejong commented 1 year ago

When using federation-on-read, the update notifications should still be distributed on write. But the target system could postpone fetching the new data if it's not currently in the user's view, or when live data is not required for some use case.

michielbdejong commented 1 year ago

I think it's probably unavoidable that a federated bookkeeping networks will (at least conceptually) have two roles for nodes: storage nodes that just store data and don't care too much about federation, and connector nodes that focus on forwarding data between storage nodes. CYB aims to be a connector node, and should probably support multiple sync paradigms:

yasharpm commented 1 year ago

But a connector like CYB could keep a log of the original source format

There are cases where a software does not provide the data in a format that can be logged. For example, in the timesheets scenario if we can only query the timeslots in a given time interval, how would you store the log of an update on a specific timeslot?

Keeping a log of the source data in the original format in the context of CYB, can probably only be useful for test and debugging purposes.

But given the purpose of CYB as a connector, maybe it can produce its own data format with signed proofs. This time also mentioning the source data in the original format. The output would be different gateways into the "list of updates" for each requested software/data-source.

In effect, CYB will be able to encapsulate the data of other software with no mandate on their data format or behavior into a federation friendly query format.

michielbdejong commented 1 year ago

if we can only query the timeslots in a given time interval,

In that case you could either log the exact query and the exact response

how would you store the log of an update on a specific timeslot?

The machine that initiates the update can log the exact api call and response. Other machines may not get notified of the update and have to find out the diff next time they fetch a full list. For instance, on Monday you do a full fetch of a timesheet, and you log the query and the response. Suppose it has for instance 17 entries. On Tuesday you repeat the same query, and you log the query and response again. Now you can compare Monday's response and Tuesday's response with each other. Suppose for instance you notice entry 4 differs, but the others match, and two new entries were added to the end of the array. Then you can know that entry 4 was apparently updated by someone at some time between Monday's query and Tuesday's query.

michielbdejong commented 1 year ago

a format that can be logged

I don't understand what you mean by that, can you give an example of a format that can not be logged?

Keeping a log of the source data in the original format in the context of CYB, can probably only be useful for test and debugging purposes.

I disagree, I think it can also be useful to enable replaying the log, which enables some convenient work flows for system administrators (including recovery from downtime, but also especially including "remaster" migrations to new formats).

given the purpose of CYB as a connector, maybe it can produce its own data format with signed proofs

Yes! Signed translations the, effectively.