Open radex opened 3 years ago
@kokusGr @michalpopek @rozPierog & all outside contributors -- feedback would be appreciated! ;)
Seem like a reasonable changes :) I think it will be easier for new (and old) users to know when to use writer/reader.
As for the subAction I was going to vote for callWriter
, but I am worrying that users might think it's necessary to call Model.callWriter
every time they use a writer and not only when calling from another writer. Because of this I think that withWriter
or subWriter
might be better, but I am not 100% convinced.
it's necessary to call Model.callWriter every time they use a writer and not only when calling from another writer
would be easy to throw an error when using this API improperly like that (if callWriter is called when no writer is active, we know it's a programmer error). I'm also curious just how cursed it would be to change this
here: https://github.com/Nozbe/WatermelonDB/blob/master/src/decorators/action/index.js#L34
@sidferreira @fahrinh @diegolmello @kilbot @henrymoulton
Still, I think one of the problems of action/@action/subAction API is just how confusing the naming is.
Definitely agree with this - naming is a bit confusing and I agree that this proposal seems like a more beginner friendly API.
@radex The proposal seems like a big improvement, I like it. As for the naming, how about writer.writeMore
and (writer || reader).readMore
?
I liked the write/read
proposal :)
I might be missing a feature, but I've never found a use case for subAction
🙈
@radex I must admit I'm feeling like losing a detail... I agree with @diegolmello, I never understood the idea behind subAction
properly.
When I read this example:
const actionC = () => someOtherAction();
const actionB = () => someAction();
const actionA = async writer => {
// performing write (and read) operations
await writer.batch(...)
await model.update(...)
await writer.andWrite(actionB)
}
await database.write(actionA)
// in some other part of the app
await database.write(actionC)
I fail to understand why not just use
const actionC = () => someOtherAction();
const actionB = () => someAction();
const actionA = async writer => {
// performing write (and read) operations
await writer.batch(...)
await model.update(...)
await actionB()
}
await database.write(actionA)
// in some other part of the app
await database.write(actionC)
If I guess right, the idea would be that, if actionC
was called before actionB
, it would run first, not possible in the 2nd case.
In the end, it feels like the action/subAction/writer/reader
is kinda like queue management... And if that's the case, why not use a queue and having things a bit more transparent.
Also, in the example below:
const newPost = await database.action(async action => {
// Note: function passed to `database.action()` MUST be asynchronous
const posts = database.collections.get('posts')
const post = await posts.create( /* configure Post here */ )
// Note: to call an action from an inline action, call `action.subAction`:
await action.subAction(() => post.markAsPromoted())
// Note: Value returned from the wrapped function will be returned to `database.action` caller
return post
})
I can't understand why the subAction
is even needed...
Again, I believe I'm missing detail here and I'm happy to be enlightened
is kinda like queue management... And if that's the case, why not use a queue and having things a bit more transparent.
queueing is one way of looking at what db.action (.read/.write) does. What do you mean though "having things a bit more transparent"? I don't know what that means to you.
I can't understand why the subAction is even needed...
const newPost = await database.action(async action => {
// Note: function passed to `database.action()` MUST be asynchronous
const posts = database.collections.get('posts')
const post = await posts.create( /* configure Post here */ )
// Note: to call an action from an inline action, call `action.subAction`:
await action.subAction(() => post.markAsPromoted())
// Note: Value returned from the wrapped function will be returned to `database.action` caller
return post
})
if post.markAsPromoted()
is an action (calls db.action or is decorated with @action), then this deadlocks. By definition no two actions can execute at once, markAsPromoted waits until it has its turn to execute, and the caller waits until markAsPromoted finishes.
I fail to understand why not just use...
If @action
modifier does not exist and your application methods don't actually call db.action, and instead just call batch/create/update (which demand that they be called inside an action/writer block), then indeed, subAction would not be necessary. And it would be a legitimate way to approach this problem…
However, this would move the responsibility for wrapping work in writer blocks from these db work methods onto callers. So, for example, your React components would have to be concerned with the fact that they can't just do onClick={() => post.markAsPromoted()}
, but instead, must do onClick={() => db.write(() => post.markAsPromoted())}
. In my mind, while I dislike subActions for creating visual clutter, if writer/reader blocks are necessary (and they are because of synchronicity), I prefer to concentrate as much complexity related to databases as possible inside model/database code, and callers in the UI part of an app should not have to know too many details about it.
Breaking into parts to make it easier:
is kinda like queue management... And if that's the case, why not use a queue and having things a bit more transparent.
queueing is one way of looking at what db.action (.read/.write) does. What do you mean though "having things a bit more transparent"? I don't know what that means to you.
Please, to all my comments, add a pinch of salt. As I said, I may be losing a detail here, but:
So, reviewing the example mentioned, I think the idea around await action.subAction(() => post.markAsPromoted())
is that markAsPromoted
has it's own action and, as it is being called inside an action, you need subAction
to avoid an exception.
When I see this example, why not having an implementation like:
model.save = () => return this.collection.updateOrCreate(this);
collection.updateOrCreate = (model) => model._raw.id ? this.update(model) : this.create(model);
collection.create = async (model) => {
const record = await database.create(model)
}
database.create = model => {
const createQuery = /* something with model */
return database.query(createQuery);
}
database.query = query => {
const deferredPromise = new DeferredPromise();
database.addQueue(async () => {
const results = database._runQuery(query)
deferredPromise.resolve(results)
});
database.runQueue();
return deferredPromise.promise;
}
database.runQueue = () => {
if (this.isRunningQueue) {
return;
}
this.isRunningQueue = this.queue[0];
delete this.queue[0];
try {
this.isRunningQueue()
} catch (e) {
// whatever is needed
} finally() {
setTimeout(database.runQueue, 1)
}
}
which would allow us to simply:
const postCollection = database.collections.get('posts')
const post = await postCollection.create(/*config*/);
// or const post = new Post(rawObjectOrConfigFunction)
await post.markAsPromoted() // that has a await this.save()
the idea, in this case, is to have each query as its own promise, making it a bit more transparent to the developer on how things run (aka we would just strip off the concept of the action).
I wrote this example in a few minutes, so LMK if something doesn't make sense.
I can't understand why the subAction is even needed...
const newPost = await database.action(async action => { // Note: function passed to `database.action()` MUST be asynchronous const posts = database.collections.get('posts') const post = await posts.create( /* configure Post here */ ) // Note: to call an action from an inline action, call `action.subAction`: await action.subAction(() => post.markAsPromoted()) // Note: Value returned from the wrapped function will be returned to `database.action` caller return post })
if
post.markAsPromoted()
is an action (calls db.action or is decorated with @action), then this deadlocks. By definition no two actions can execute at once, markAsPromoted waits until it has its turn to execute, and the caller waits until markAsPromoted finishes.
my problem here is that post.markAsPromoted()
is inside of action already... so why not just run it?!
but based on your comments, I believe the whole problem is about post.markAsPromoted()
having @action
, causing an action to run inside another action, without proper control.
I fail to understand why not just use...
If
@action
modifier does not exist and your application methods don't actually call db.action, and instead just call batch/create/update (which demand that they be called inside an action/writer block), then indeed, subAction would not be necessary. And it would be a legitimate way to approach this problem…However, this would move the responsibility for wrapping work in writer blocks from these db work methods onto callers. So, for example, your React components would have to be concerned with the fact that they can't just do
onClick={() => post.markAsPromoted()}
, but instead, must doonClick={() => db.write(() => post.markAsPromoted())}
. In my mind, while I dislike subActions for creating visual clutter, if writer/reader blocks are necessary (and they are because of synchronicity), I prefer to concentrate as much complexity related to databases as possible inside model/database code, and callers in the UI part of an app should not have to know too many details about it.
So, basically, the need for subAction
is related to the use of @action
decorators and avoid things blowing up?!
Someone who doesn't uses @action
(I don't) won't need it?!
the idea, in this case, is to have each query as its own promise, making it a bit more transparent to the developer on how things run (aka we would just strip off the concept of the action).
I don't think you're grasping the concept of actions/writers. The internal implementation detail of queueing work for the database is not at all a problem. The problem is of mutual exclusion - making sure that only one thing is modifying the database at the same time.
Here's a bit from documentation:
Consider a function `markCommentsAsSpam` that fetches a list of comments on a post, and then marks them all as spam. The two operations (fetching, and then updating) are asynchronous, and some other operation that modifies the database could run in between. And it could just happen to be a function that adds a new comment on this post. Even though the function completes *successfully*, it wasn't *actually* successful at its job.
This example is trivial. But others may be far more dangerous. If a function fetches a record to perform an update on, this very record could be deleted midway through, making the action fail (and potentially causing the app to crash, if not handled properly). Or a function could have invariants determining whether the user is allowed to perform an action, that would be invalidated during action's execution. Or, in a collaborative app where access permissions are represented by another object, parallel execution of different actions could cause those access relations to be left in an inconsistent state.
The worst part is that analyzing all *possible* interactions for dangers is very hard, and having sync that runs automatically makes them very likely.
Solution? Group together related reads and writes together in an Writer, enforce that all writes MUST occur in a Writer, and only allow one Writer to run at the time. This way, it's guaranteed that in a Writer, you're looking at a consistent view of the world. Most simple reads are safe to do without groupping them, however if you have multiple related reads, you also need to wrap them in a Reader.
here's another example, pseudo-code simplified from sync:
applyChangesFromServer = async records => {
// 1:
const ids = records.map(r => r.id)
const currentRecords = await db....query(Q.where('id', Q.oneOf(ids))
// 2:
await db.batch(
...records.created.map(record => collection.prepareCreate(record)),
...records.updated.map(record => currentRecords.find(r => r.id === record.id).resolveConflictsWith(record)),
)
}
now imagine that while this is starting to execute, a user clicks on something, that, say, deletes one of the records. If this happens after sync's (1) has been called, but before it asynchronously returns, then we'll have inconsistent data, and (2) might try to update a record that no longer exists.
if you're familiar with concurrent programming principles, this is exactly the same problem. asynchronicity is similar to concurrency, even though only one thread is involved, and to maintain consistency in data access, we need to have mutual exclusion of writes. but because locking and unlocking a mutex is very error-prone and inconvenient, we abstract it away by serializing access - making a queue you access with a special method
The point is that as long as WatermelonDB is asynchronous (and there are pretty good reasons to want to keep it asynchronous), multi-step actions that change the database must have an explicit Writer block. There's no way around it. Either all the APIs are synchronous, there's an explicit action/writer API, or Watermelon only "works" most of the time, but then breaks and crashes in utterly undebuggable ways.
And a user doesn't really have to know or understand this fully, it only has to make sense to them that whenever they want to do some work that modifes the database, they need to wrap it in database.write(() => ...)
my problem here is that post.markAsPromoted() is inside of action already... so why not just run it?!
I'm eagerly waiting for suggestions, because I hate this too, but again:
@writer
(@writer
is just a convenience)As far as I can see, either we have to have something like subAction/callWriter, or (as stated before), we have to get rid of the @writer decorator and move the entire responsibility of wrapping work in database.write
from functions that do database work to callers of these functions.
I'd absolutely love to be wrong and discover that there's a third option!
BTW:
I implemented the first draft of the changes I suggested here: https://github.com/Nozbe/WatermelonDB/pull/1031
I thought about @michalpopek's proposal to call writer nesting writeMore
but after thinking about it some more, I decided to stick to callWriter
- it's a bland, technical, but hopefully unambiguous name. I wasn't sure how writeMore
would be interpreted by a new user.
I have second thoughts about whether we want to have "Readers" in addition to "Writers". Throughout both Watermelon's and NozbeTeams' codebases, I found only a few cases where mutual exclusion is necessary but writing isn't. Having "Readers" in addition to "Writers" makes for kinda nice symmetry syntactically, but there's little practical benefit to the distinction. Using a Reader just protects you from unintentionally modifying the database, and in theory multiple Readers could run simultaneously (but there's probably little benefit from that). The downside is that there's another concept and the diagnostic warnings and documentation that have to keep repeating "writer or reader". We could just have Writers, and explain to the (very) advanced users that have a use case that requires consistent view of the data to wrap the reads in a Writer. OTOH, I already implemented it :P @michalpopek thoughts?
I don't think you're grasping the concept of actions/writers. The internal implementation detail of queueing work for the database is not at all a problem. The problem is of mutual exclusion - making sure that only one thing is modifying the database at the same time.
I totally agree with that. I'm sorry if I'm cluttering the discussion.
Async Thread 1 | Async Thread 2 | |
---|---|---|
Step 1 | started fetching changes | |
Step 2 | const list = posts.fetchAll() | |
Step 3 | Apply changes (adding some posts, removing others) | |
step 4 | list.map(item => item.markAsRead()) |
It seems that the problem is a scenario like this one: Step 3 may add items that are not in list
or remove existing items, maybe causing an exception, but probably keeping some items as unread
.
The actions
sounds like worker threads
(actually it is called "worker" in the code), and if I'm right, I would suggest worker
.
The main issue for that seems to be:
database.collections.get('posts').query().set{{read: true}}.exec()
could generate an update query, for example ( yes, totally based on http://knexjs.org/#Builder-update ))
And now I finally got the idea of the action: groups database interactions together to avoid other interactions to modify the data currently in use. In some ways, acting like a transaction (without the rollback feature).
Assuming I got everything right, I would suggest:
a) everything is automatically added to an action. If you use: database.collection.get('posts').query().fetch()
it will automatically create an action/worker/whatever
.
b) you still can use "actions" for complex scenarios:
database.getAction(async (action) => {
const posts = database.collections.get('posts')
const post = await posts.create(/* configure Post here */, { action })
await post.markAsPromoted({action})
await post.appendToBody(this.body, {action})
return post
})
runWorker = () => {
const currentWorker = ...;
currentWorker.start();
callbackPromise(currentWorker).finally(() => currentWorker.done());
}
In short, I look for making Watermelon as simple as possible for most uses, but keeping options for more complex scenarios.
I hope it make sense
Also, worth mentioning:
https://github.com/ujjwalguptaofficial/sqlweb https://jsstore.net
(just found them, not my use)
I decided to stick to callWriter - it's a bland, technical, but hopefully unambiguous name. I wasn't sure how writeMore would be interpreted by a new user.
Sure, sounds like a reasonable decision. We could also be even more explicit and extend the name to callChildWriter
/callChildReader
or something in that spirit, but I'm not sure, if it is necessary.
The downside is that there's another concept and the diagnostic warnings and documentation that have to keep repeating "writer or reader".
IMO the reader API makes the whole read/write concept complete and rules out any potential doubts that could arise, if we just introduced the writer API alone. From what I saw in your implementation PR, the reader API is already marked as an advanced topic, so I don't think it will bother newcomers that much. And the maintainance cost of being aware of the API when editing docs or writing warnings sounds relatively low for me.
We could just have Writers, and explain to the (very) advanced users that have a use case that requires consistent view of the data to wrap the reads in a Writer.
We could do that, but TBH it would feel like we're serving our users workarounds instead of providing them with a proper solution.
OTOH, I already implemented it :P
Yeah, and from what I saw it looks like the reader implementation isn't extensive, so it shouldn't create any serious debt
RFC: Improved Reader/Writer API
Because of WatermelonDB's primarily asynchronous API, we need a built-in primitive to block concurrent database writes. Otherwise, multiple concurrent asynchronous chains of operations would break consistency within a chain. A more thorough explanation.
This is currently achieved by the
await database.action(() => ...)
API and the@action
decorator shortcut for Model methods. A complication occurs when we have an action method/block and want to call it from another action method/block. JavaScript/WatermelonDB does not have sufficient magic to understand that the intention is to call the contents of the action as part of another action - therefore, by definition, the "sub-action" is blocked and queued - and our action queue deadlocks. So we explicitly allow this bysubAction(() => ...)
All of this complicates Watermelon's API, makes the syntax uglier, and DX worse, especially for beginners, but I don't know of a better way to achieve mutal exclusion / get rid of the need for mutual exclusion without getting rid of Watermelon's asynchronicity.
Still, I think one of the problems of action/@action/subAction API is just how confusing the naming is. What's even an "Action"? It's so generic as to mean anything, or it might be confused to mean precisely "database transaction" -- and actions are not transactions (despite some similarities). I also dislike "subAction", but couldn't think of anything better.
Proposed changes
"Actions" are now "Writers" and "Readers".
We explicitly tell database we want to
write
to it. In the writer block, we get awriter
object, which we can use to explicitly call other readers/writers. It also has abatch
method, which is short fordatabase.batch(...)
:"Readers" also exist. Why? If you want to compute some information that requires multiple fetches, you also need mutual exclusion to ensure no write happens in between your fetches. But if we change "Actions" to "Writers", then we'd wrongly suggest to the user that you should write to the database in the block. In fact, we should teach users, that they need to wrap not only related reads+writes in a writer block, but also a read+read. In addition to being dx-friendly, the separation also allows us to prevent unintended mistakes and improve performance. In a reader block, we'd only enforce queueing, but forbid database writes and "sub-action" calling of another writers. Also, we could allow multiple reader blocks to operate concurrently, since we know that's safe as long as no writes occur.
We keep the decorator-based shortcut API for defining Model methods. Yeah, it pollutes the Model namespace, which I'm not a great fan of, but it makes Model methods (which should make up the vast majority of writers/readers in a idiomatic WatermelonDB codebase) so much cleaner, I think it's worth it.
As you've noticed, the part I struggle with is coming up with a nice, reasonably self-explanatory, non-confusing, unambiguous name for calling readers/writers from readers/writers. All the alternatives considered:
I'd appreciate feedback especially on this part.
Deprecations & Removals
new Database({ actionsEnabled: false })
is no longer supported and an errornew Database({ actionsEnabled: true })
raises a deprecation warning (no longer necessary)@action
,database.action
,action.subAction
,Model.subAction
are deprecated and will be removed laterdatabase.batch()
is now deprecated, it's expected that all writes happen in writers viawriter.batch()
(orModel.batch()
when using@writer
). This improves safety - it's less likely that an illegal write triggered outside a writer is accepted (because a writer happens to be active), since you need a handle to callwriter.batch()
on an active writer.