federatedbookkeeping / research

Research notes about Federated Bookkeeping and related topics
https://federatedbookkeeping.org
MIT License
7 stars 2 forks source link

Extract Entity #43

Open michielbdejong opened 1 month ago

michielbdejong commented 1 month ago

How would we solve the Extract Entity challenge described in section 2 of https://arxiv.org/pdf/2309.11406 ?

michielbdejong commented 1 month ago

I think it's easy to allow switching back and forth between the one-table and the two-table serialisation of this dataset, if we make a few assumptions:

Consequently, customer name cannot be a primary key, we need to assign some other identifier. That makes it hard to do in parallel in various replicas of a distributed storage system.

Also, if you change the address of a customer while in two-table view, it has a different effect than if you change the address on an order in one-table view.

But that's not impossible to represent I think, it reminds me of the "halve this recipe" operation from https://mattweidner.com/2023/09/26/crdt-survey-1.html

michielbdejong commented 1 month ago

A customer with zero orders would have to be represented in the one-table view with item and quantity set to null.

So:

michielbdejong commented 1 month ago

To generalise the solution away from the example:

michielbdejong commented 1 month ago

The implementation should be more efficient than just storing the whole two-dimensional array, but its behaviour should be equivalent

michielbdejong commented 1 month ago

I guess I could just define a Lens between the one-table and the two-table view.

michielbdejong commented 1 month ago

All replicas should switch to the new schema and the code that lists unshipped orders should change at the same time. This can be done with a Lens, probably?

michielbdejong commented 1 month ago

I looked into Cambria and its existing lenses. An 'extract' lens should be possible.

Input document:

{
   "orders": [
    {
      "item":"anvil",
      "quantity": 1,
      "ship_date": "2/3/23",
      "customer_name": "Wile E Coyote",
      "customer_address": "123 Desert Station"
    },
    {
      "item":"dynamite",
      "quantity": 2,
      "ship_date": null,
      "customer_name": "Daffy Duck",
      "customer_address": "White Rock Lake"
    },
    {
      "item":"bird seed",
      "quantity": 1,
      "ship_date": null,
      "customer_name": "Wile E Coyote",
      "customer_address": "123 Desert Station"
    }
  ]
}

Lens:

lens:
  - extract
  [...]

Output document:

{
  "orders": [
   {
     "item":"anvil",
     "quantity": 1,
     "ship_date": "2/3/23",
     "customer": 1
   },
   {
     "item":"dynamite",
     "quantity": 2,
     "ship_date": null,
     "customer": 2
   },
   {
     "item":"bird seed",
     "quantity": 1,
     "ship_date": null,
     "customer": 1
   }
  ],
  "customers": [
    {
      "id": 1,
      "name": "Wile E Coyote",
      "address": "123 Desert Station"
    },
    {
      "id": 2,
     "name": "Daffy Duck",
     "address": "White Rock Lake"
    }
  ]
}
michielbdejong commented 1 month ago

A but to make it deterministic you need to know the LRI mapping!