WICG / indexed-db-observers

Prototyping and discussion around indexeddb observers.
Other
85 stars 12 forks source link

Limitation in creating the same scope of the bound transaction when osberving a change #31

Closed bevis-tseng closed 8 years ago

bevis-tseng commented 8 years ago

There is a design issue to provide a read-only transaction immediately to the IDBObserverCallback in the following scenario:

  1. Observer A starts after Transaction A with scope to stores[X, Y, Z] is complete.
  2. Transaction B updates records in stores[X, Y], and Transaction C updates records in store[Z] in parallel. (This is allowed once the scopes are not overlapped.)
  3. If Transaction B is complete, according to EXPLAINER.md, a transaction with scope to stores[X, Y, Z] for Observer A has to be created immediately if |transaction| in the |IDBObserverInit| option is set to true to notify the changes done in Transaction B but it can't according to the rule in IDB spec in [1]: " Generally speaking, the above requirements mean that any transaction which has an overlapping scope with a read/write transaction and which was created after that read/write transaction, can’t run in parallel with that read/write transaction. "

[1] http://w3c.github.io/IndexedDB/#transaction-lifetime-concept

dmurph commented 8 years ago

Ah crap. Yeah, you're right. Hm........ what would the users want here? This is a bit of an edge case so I'm not sure how often this would get hit, but we definitely need to figure out what to do.

Options:

  1. We bundle transaction B and C's changes together. More specifically, the readonly observer transaction pushes itself to the very next in line for the readonly lock of it's object stores (X,Y,Z), and then when it can get all of them it sends all relevant changes.
  2. We don't include 'z' in the readonly transaction. I believe this makes the data consistency use case much harder though, as you can't read in the 'Z' store.
  3. Any thoughts?
dmurph commented 8 years ago

@inexorabletash, @cmumford, @sicking for thoughts

dmurph commented 8 years ago

Actually, I think we can solve this in the following way:

The transaction that is given back is scheduled in the NEXT available spot in the queue, after all exclusive lock (write) transactions have finished. This will push back any another transactions that are waiting for an exclusive lock on any of the transaction's object stores.

So in the example above, the transaction B will not be scheduled until after C is finished.

Does that make sense?

bevis-tseng commented 8 years ago

So in the example above, the transaction B will not be scheduled until after C is finished.

So, do we still apply the rule of invoking ObserverCallback per-transaction basis without bundle the changes together and have 2 readonly transactions (let's say 'post-B txn', 'post-C txn') created in sequence but are delivered to the callback in the same order until txn C is finished?

If yes, it makes sense to me!

dmurph commented 8 years ago

I think so, yes, as long as you forgot a 'not' in that sentence:

So, do we still apply the rule of invoking ObserverCallback per-transaction basis without bundle the changes together and have 2 readonly transactions (let's say 'post-B txn', 'post-C txn') created in sequence but are NOT delivered to the callback in the same order until txn C is finished?

Scenario clarification:

  1. Register observer on transaction A w/ transaction support, which encompasses stores X, Y, Z
  2. Transactions B and C happen at same time: [Transaction B writes to X, Y] [Transaction C writes to Z]
  3. Transaction B commits.
  4. (Callback for observer scheduled after transaction C commits (due to write lock on Z) for transaction B.)
  5. Transaction C commits.
  6. (Callback for observer scheduled - after one scheduled in step 4 - for transaction C.)
  7. Observer callback for B, and transaction's readonly state includes all changes transaction C.
  8. Observer callback for C, and transaction's readonly state includes all changes transaction C.

Hm.... I'm going to check w/ people to make sure that would be OK

Hey @arthurhsu, would this mess you guys up? Or is this OK expected behavior.

If not, then we need to:

  1. Merge changes in this case, where the observer gets changes from B and C, or
  2. Make X, Y, Z share an exclusive/readonly lock after the observer is created, so that if any transaction works on one then they work on all.
dmurph commented 8 years ago

@nolanlawson Can you PTAL at this issue? What would you expect/want in this scenario?

dmurph commented 8 years ago

In chrome, we can actually have the step 7 above have the transaction WITHOUT C's changes, as leveldb let's us grab a snapshot of the world from the B transaction (as in the end everything is writen serially in a log database). So this isn't a problem for us, but it might be for FF.

arthurhsu commented 8 years ago

What's the behavior of Transaction A w.r.t. the callbacks made from B and C? Transaction A sees the changes of B and C, and it decided to commit, what will be happening?

dmurph commented 8 years ago

@arthurhsu so transaction A already committed, sorry, that was just the transaction that we start the observer with. I could have read the world state in for the observer, pretend it read in some of X, Y, and Z.

Ok, so I think we have the following options/behaviors:

  1. We change the transaction object stores on observation to only be the ones that have changed. This means that the change observation for transaction B would only be over object stores X and Y, and we could schedule it right away while transaction C is happening. Pros: simplest change, fits in our spec model still. Cons: a little inconsistant, and would require devs to save database state in case they can't read it in from the given transaction.
  2. We accumulate all changes for observed object stores for all transactions. So this means that when B ends, we grab readonly locks on X and Y and wait until we have the Z lock to send changes along to the observer. This would call the observer once from the B and C transactions, and include all changes. Pros: we maintain the full X,Y,Z transaction in the observer, fits our spec model. Cons: it's a weird edge case that people wouldn't expect. Weirder spec language.
  3. We (I think maintain - this is the way it would work now) expect that the observation call for B would happen after C committed, but the transaction (over X,Y,Z) wouldn't have C's changes. This would mean the implementation would have to know how to snapshot the database state, which is a bit more limiting. Pros: Provides all object stores to observer event, maintains current lock structure Cons: Requires limiting implementation, or making things less efficient (one could implement this by merging object store locks when an observer is registered for multiple to prevent this scenario from happening). Also requires being more specific in spec - defining transaction ordering (C commits after B)
  4. Create a new transaction type - observer transaction - that can read from the database independently from other transactions (no locks needed). Pros: more efficient, and contains X,Y,Z readability. Cons: limits the implementations to needing some sort of snapshot support for database.

WDYT? That's all I can think of right now. @aliams can you also look at this?

arthurhsu commented 8 years ago

For 2, the accumulation is tricky and I'm not sure how this will scale if there's a half-baked transaction happening in between B and C and install another observer.

For 4 we still need to know B/C is changing DB so A can know to read XYZ after B/C committed. This seems to be back to square 0.

I think 1 and 3 can be accommodated by Lovefield since our onChange event can deal either.

bevis-tseng commented 8 years ago

I think so, yes, as long as you forgot a 'not' in that sentence:

Yes, I forgot a 'not' here. :|

nolanlawson commented 8 years ago

FWIW this case should not affect PouchDB. We have no readwrite operations that can be parallelized; we always update the same stores.

@daleharvey may want to comment on whether or not this would affect our upcoming "idb-next" adapter. (Speaking of which, Observers seems like a killer feature for idb-next.)

dmurph commented 8 years ago

@bevis-tseng and @aliams, What do you think about option 1 above? Where we only include the object stores in the observer transaction if they were modified?

bevis-tseng commented 8 years ago

For option#1, there is no concern from implementation perspective. My concern is that whether this change meets or limits any real use cases that devs want.

dmurph commented 8 years ago

@daleharvey, does option #1 (the transaction given to the observer only includes object stores that were modified, not necessarily all object stores it's observing) conflict w/ your feature?

I'll find a Docs contact to weigh in as well.

sicking commented 8 years ago

I think Option 1 does not meet some use cases.

Consider for example the UI in github which shows the list of participants in this issue.

It would observe two objectStores: A) It would observe an objectStore which caches metadata about individual issues, one entry per issue. The entries have a property which lists the participants github-user-ids B) It would observe an objectStore which caches icons for github users keyed on github user id.

Changes to either store might result in the list of participants UI needing to be updated. If a participant is added/removed, you need to update the list. Likewise if the picture used by a participant is updated.

If you get a transaction which only contains either of the two stores, you won't be able to update he UI.

Another way to think of this is that any time that you are observing multiple objectStores in order to do a "join" between them, you need a transaction which spans all those object stores. Using such joins is likely generally the reason to observe multiple stores.

sicking commented 8 years ago

There is also the following two options:

5) State that two transactions are only parallelizable if they don't have overlapping scopes and there is no observers which have scopes which overlap both transactions. 6) Same as option 3, but for implementations that aren't able to keep a snapshot, require that the implementation don't parallelize the two transactions.

Note that parallelizing non-overlapping transactions is not something that implementations are required to do. So option 6 is essentially the same as option 3 from a normative point of view.

dmurph commented 8 years ago

Hm. Yeah, I think option 6 gives the most flexibility for implementation, and also keeps the use cases you mentioned above non-complex.

@bevis-tseng, how about option 6 there?

bevis-tseng commented 8 years ago

If my understanding is correct that option#6 will apply option#5 if snapshot is not supported by the implementation, then it's true that it will be more flexible for implementation.

dmurph commented 8 years ago

The docs folks said they don't mind. I think I'm going to shoot for option 6 from Jonas unless anyone has strong objections.

this also eliminates the ambiguity issue brought up in #33 about option#1, which is nice.