Open michielbdejong opened 1 month ago
I think it's easy to allow switching back and forth between the one-table and the two-table serialisation of this dataset, if we make a few assumptions:
Consequently, customer name cannot be a primary key, we need to assign some other identifier. That makes it hard to do in parallel in various replicas of a distributed storage system.
Also, if you change the address of a customer while in two-table view, it has a different effect than if you change the address on an order in one-table view.
But that's not impossible to represent I think, it reminds me of the "halve this recipe" operation from https://mattweidner.com/2023/09/26/crdt-survey-1.html
A customer with zero orders would have to be represented in the one-table view with item
and quantity
set to null.
So:
To generalise the solution away from the example:
The implementation should be more efficient than just storing the whole two-dimensional array, but its behaviour should be equivalent
I guess I could just define a Lens between the one-table and the two-table view.
All replicas should switch to the new schema and the code that lists unshipped orders should change at the same time. This can be done with a Lens, probably?
I looked into Cambria and its existing lenses. An 'extract' lens should be possible.
Input document:
{
"orders": [
{
"item":"anvil",
"quantity": 1,
"ship_date": "2/3/23",
"customer_name": "Wile E Coyote",
"customer_address": "123 Desert Station"
},
{
"item":"dynamite",
"quantity": 2,
"ship_date": null,
"customer_name": "Daffy Duck",
"customer_address": "White Rock Lake"
},
{
"item":"bird seed",
"quantity": 1,
"ship_date": null,
"customer_name": "Wile E Coyote",
"customer_address": "123 Desert Station"
}
]
}
Lens:
lens:
- extract
[...]
Output document:
{
"orders": [
{
"item":"anvil",
"quantity": 1,
"ship_date": "2/3/23",
"customer": 1
},
{
"item":"dynamite",
"quantity": 2,
"ship_date": null,
"customer": 2
},
{
"item":"bird seed",
"quantity": 1,
"ship_date": null,
"customer": 1
}
],
"customers": [
{
"id": 1,
"name": "Wile E Coyote",
"address": "123 Desert Station"
},
{
"id": 2,
"name": "Daffy Duck",
"address": "White Rock Lake"
}
]
}
A but to make it deterministic you need to know the LRI mapping!
How would we solve the Extract Entity challenge described in section 2 of https://arxiv.org/pdf/2309.11406 ?