Nozbe / WatermelonDB

🍉 Reactive & asynchronous database for powerful React and React Native apps ⚡️
https://watermelondb.dev
MIT License
10.62k stars 600 forks source link

RFC: Improved Reader/Writer API #1015

Open radex opened 3 years ago

radex commented 3 years ago

RFC: Improved Reader/Writer API

Because of WatermelonDB's primarily asynchronous API, we need a built-in primitive to block concurrent database writes. Otherwise, multiple concurrent asynchronous chains of operations would break consistency within a chain. A more thorough explanation.

This is currently achieved by the await database.action(() => ...) API and the @action decorator shortcut for Model methods. A complication occurs when we have an action method/block and want to call it from another action method/block. JavaScript/WatermelonDB does not have sufficient magic to understand that the intention is to call the contents of the action as part of another action - therefore, by definition, the "sub-action" is blocked and queued - and our action queue deadlocks. So we explicitly allow this by subAction(() => ...)

All of this complicates Watermelon's API, makes the syntax uglier, and DX worse, especially for beginners, but I don't know of a better way to achieve mutal exclusion / get rid of the need for mutual exclusion without getting rid of Watermelon's asynchronicity.

Still, I think one of the problems of action/@action/subAction API is just how confusing the naming is. What's even an "Action"? It's so generic as to mean anything, or it might be confused to mean precisely "database transaction" -- and actions are not transactions (despite some similarities). I also dislike "subAction", but couldn't think of anything better.

Proposed changes

"Actions" are now "Writers" and "Readers".

We explicitly tell database we want to write to it. In the writer block, we get a writer object, which we can use to explicitly call other readers/writers. It also has a batch method, which is short for database.batch(...):

await database.write(async writer => {
  // performing write (and read) operations
  await writer.batch(...)
  await model.update(...)

  // calling other writers (and readers) -- ideas for the API name?
  await writer.andWrite(() => someAction())
  await writer.andRead(() => someAction())
  await writer.call(() => someAction())
  await writer.also(() => someAction())
  await writer.withWriter(() => someAction())
  await writer.useWriter(() => someAction())
  await writer.callWriter(() => someAction())
  await writer.with(() => someAction())
})

"Readers" also exist. Why? If you want to compute some information that requires multiple fetches, you also need mutual exclusion to ensure no write happens in between your fetches. But if we change "Actions" to "Writers", then we'd wrongly suggest to the user that you should write to the database in the block. In fact, we should teach users, that they need to wrap not only related reads+writes in a writer block, but also a read+read. In addition to being dx-friendly, the separation also allows us to prevent unintended mistakes and improve performance. In a reader block, we'd only enforce queueing, but forbid database writes and "sub-action" calling of another writers. Also, we could allow multiple reader blocks to operate concurrently, since we know that's safe as long as no writes occur.

await database.read(async reader => {
  // performing read-only operations
  await table.fetch(...)

  // calling other readers:
  await reader.andRead(() => someAction())
  await reader.call(() => someAction())
  await reader.also(() => someAction())
  await reader.with(() => someAction())
})

We keep the decorator-based shortcut API for defining Model methods. Yeah, it pollutes the Model namespace, which I'm not a great fan of, but it makes Model methods (which should make up the vast majority of writers/readers in a idiomatic WatermelonDB codebase) so much cleaner, I think it's worth it.

@writer async deleteComment() {
  await this.update(...)
  await this.batch(...)

  // calling other writers/readers
  await this.andWrite(() => this.togglePin())
  await this.callWriter(() => this.togglePin())
  await this.writeWith(() => this.togglePin())
}
@reader async getAllMembers() {
  // perform reads
  await collection.query(...)

  // calling other readers
  await this.andRead(() => this.getAllAdmins())
  await this.callReader(() => this.getAllAdmins())
  await this.readWith(() => this.getAllAdmins())
}

As you've noticed, the part I struggle with is coming up with a nice, reasonably self-explanatory, non-confusing, unambiguous name for calling readers/writers from readers/writers. All the alternatives considered:

  await writer.andThen(() => someOtherWriter()) // sounds like it's related to Promises...
  await writer.thenWrite(() => someOtherWriter() // ditto + sounds like it would be executed *after* current action
  await writer.andWrite(() => someOtherWriter())
  await writer.withWriter(() => someOtherWriter()) // withX, useX can create confusion for React users (HOCs/Hooks convention)
  await writer.useWriter(() => someOtherWriter())
  await writer.callWriter(() => someOtherWriter()) // seems most unambiguous?
  await writer.writeWith(() => someOtherWriter()) // also seems quite nice
  await writer.then(() => someOtherWriter()) // one-word methods might be confusing when defined on Model?
  await writer.call(() => someOtherWriter())
  await writer.also(() => someOtherWriter())
  await writer.with(() => someOtherWriter())
  await writer.use(() => someOtherWriter())
  await writer.sub(() => someOtherWriter())
  await writer.subAction(() => someOtherWriter()) // would be good to get rid of the "action" word once and for all
  await writer.subWriter(() => someOtherWriter())

I'd appreciate feedback especially on this part.

Deprecations & Removals

  1. Legacy new Database({ actionsEnabled: false }) is no longer supported and an error
  2. new Database({ actionsEnabled: true }) raises a deprecation warning (no longer necessary)
  3. @action, database.action, action.subAction, Model.subAction are deprecated and will be removed later
  4. (This may be controversial!) database.batch() is now deprecated, it's expected that all writes happen in writers via writer.batch() (or Model.batch() when using @writer). This improves safety - it's less likely that an illegal write triggered outside a writer is accepted (because a writer happens to be active), since you need a handle to call writer.batch() on an active writer.
radex commented 3 years ago

@kokusGr @michalpopek @rozPierog & all outside contributors -- feedback would be appreciated! ;)

kokusGr commented 3 years ago

Seem like a reasonable changes :) I think it will be easier for new (and old) users to know when to use writer/reader.

As for the subAction I was going to vote for callWriter, but I am worrying that users might think it's necessary to call Model.callWriter every time they use a writer and not only when calling from another writer. Because of this I think that withWriter or subWriter might be better, but I am not 100% convinced.

radex commented 3 years ago

it's necessary to call Model.callWriter every time they use a writer and not only when calling from another writer

would be easy to throw an error when using this API improperly like that (if callWriter is called when no writer is active, we know it's a programmer error). I'm also curious just how cursed it would be to change this here: https://github.com/Nozbe/WatermelonDB/blob/master/src/decorators/action/index.js#L34

radex commented 3 years ago

@sidferreira @fahrinh @diegolmello @kilbot @henrymoulton

henrymoulton commented 3 years ago

Still, I think one of the problems of action/@action/subAction API is just how confusing the naming is.

Definitely agree with this - naming is a bit confusing and I agree that this proposal seems like a more beginner friendly API.

michalpopek commented 3 years ago

@radex The proposal seems like a big improvement, I like it. As for the naming, how about writer.writeMore and (writer || reader).readMore?

diegolmello commented 3 years ago

I liked the write/read proposal :)

I might be missing a feature, but I've never found a use case for subAction 🙈

sidferreira commented 3 years ago

@radex I must admit I'm feeling like losing a detail... I agree with @diegolmello, I never understood the idea behind subAction properly.

When I read this example:

const actionC = () => someOtherAction();
const actionB = () => someAction();
const actionA = async writer => {
  // performing write (and read) operations
  await writer.batch(...)
  await model.update(...)

  await writer.andWrite(actionB)
}
await database.write(actionA)
// in some other part of the app
await database.write(actionC)

I fail to understand why not just use

const actionC = () => someOtherAction();
const actionB = () => someAction();
const actionA = async writer => {
  // performing write (and read) operations
  await writer.batch(...)
  await model.update(...)

  await actionB()
}
await database.write(actionA)
// in some other part of the app
await database.write(actionC)

If I guess right, the idea would be that, if actionC was called before actionB, it would run first, not possible in the 2nd case.

In the end, it feels like the action/subAction/writer/reader is kinda like queue management... And if that's the case, why not use a queue and having things a bit more transparent.

Also, in the example below:

const newPost = await database.action(async action => {
  // Note: function passed to `database.action()` MUST be asynchronous
  const posts = database.collections.get('posts')
  const post = await posts.create( /* configure Post here */ )

  // Note: to call an action from an inline action, call `action.subAction`:
  await action.subAction(() => post.markAsPromoted())

  // Note: Value returned from the wrapped function will be returned to `database.action` caller
  return post
})

I can't understand why the subAction is even needed...

Again, I believe I'm missing detail here and I'm happy to be enlightened

radex commented 3 years ago

is kinda like queue management... And if that's the case, why not use a queue and having things a bit more transparent.

queueing is one way of looking at what db.action (.read/.write) does. What do you mean though "having things a bit more transparent"? I don't know what that means to you.

I can't understand why the subAction is even needed...

const newPost = await database.action(async action => {
  // Note: function passed to `database.action()` MUST be asynchronous
  const posts = database.collections.get('posts')
  const post = await posts.create( /* configure Post here */ )

  // Note: to call an action from an inline action, call `action.subAction`:
  await action.subAction(() => post.markAsPromoted())

  // Note: Value returned from the wrapped function will be returned to `database.action` caller
  return post
})

if post.markAsPromoted() is an action (calls db.action or is decorated with @action), then this deadlocks. By definition no two actions can execute at once, markAsPromoted waits until it has its turn to execute, and the caller waits until markAsPromoted finishes.

I fail to understand why not just use...

If @action modifier does not exist and your application methods don't actually call db.action, and instead just call batch/create/update (which demand that they be called inside an action/writer block), then indeed, subAction would not be necessary. And it would be a legitimate way to approach this problem…

However, this would move the responsibility for wrapping work in writer blocks from these db work methods onto callers. So, for example, your React components would have to be concerned with the fact that they can't just do onClick={() => post.markAsPromoted()}, but instead, must do onClick={() => db.write(() => post.markAsPromoted())}. In my mind, while I dislike subActions for creating visual clutter, if writer/reader blocks are necessary (and they are because of synchronicity), I prefer to concentrate as much complexity related to databases as possible inside model/database code, and callers in the UI part of an app should not have to know too many details about it.

sidferreira commented 3 years ago

Breaking into parts to make it easier:

is kinda like queue management... And if that's the case, why not use a queue and having things a bit more transparent.

queueing is one way of looking at what db.action (.read/.write) does. What do you mean though "having things a bit more transparent"? I don't know what that means to you.

Please, to all my comments, add a pinch of salt. As I said, I may be losing a detail here, but:

So, reviewing the example mentioned, I think the idea around await action.subAction(() => post.markAsPromoted()) is that markAsPromoted has it's own action and, as it is being called inside an action, you need subAction to avoid an exception.

When I see this example, why not having an implementation like:

model.save = () => return this.collection.updateOrCreate(this);

collection.updateOrCreate = (model) => model._raw.id ? this.update(model) : this.create(model);

collection.create = async (model) => {
  const record = await database.create(model)
}

database.create = model => {
  const createQuery = /* something with model */
  return database.query(createQuery);
}

database.query = query => {
  const deferredPromise = new DeferredPromise();
  database.addQueue(async () => {
    const results = database._runQuery(query)
    deferredPromise.resolve(results)
  });
  database.runQueue();
  return deferredPromise.promise;
}
database.runQueue = () => {
  if (this.isRunningQueue) {
    return;
  }
  this.isRunningQueue = this.queue[0];
  delete this.queue[0];
  try {
    this.isRunningQueue()
  } catch (e) {
    // whatever is needed
  } finally() {
    setTimeout(database.runQueue, 1)
  }
}

which would allow us to simply:

const postCollection = database.collections.get('posts')
const post = await postCollection.create(/*config*/);
// or const post = new Post(rawObjectOrConfigFunction)
await post.markAsPromoted() // that has a await this.save()

the idea, in this case, is to have each query as its own promise, making it a bit more transparent to the developer on how things run (aka we would just strip off the concept of the action).

I wrote this example in a few minutes, so LMK if something doesn't make sense.

sidferreira commented 3 years ago

I can't understand why the subAction is even needed...

const newPost = await database.action(async action => {
  // Note: function passed to `database.action()` MUST be asynchronous
  const posts = database.collections.get('posts')
  const post = await posts.create( /* configure Post here */ )

  // Note: to call an action from an inline action, call `action.subAction`:
  await action.subAction(() => post.markAsPromoted())

  // Note: Value returned from the wrapped function will be returned to `database.action` caller
  return post
})

if post.markAsPromoted() is an action (calls db.action or is decorated with @action), then this deadlocks. By definition no two actions can execute at once, markAsPromoted waits until it has its turn to execute, and the caller waits until markAsPromoted finishes.

my problem here is that post.markAsPromoted() is inside of action already... so why not just run it?!

but based on your comments, I believe the whole problem is about post.markAsPromoted() having @action, causing an action to run inside another action, without proper control.

sidferreira commented 3 years ago

I fail to understand why not just use...

If @action modifier does not exist and your application methods don't actually call db.action, and instead just call batch/create/update (which demand that they be called inside an action/writer block), then indeed, subAction would not be necessary. And it would be a legitimate way to approach this problem…

However, this would move the responsibility for wrapping work in writer blocks from these db work methods onto callers. So, for example, your React components would have to be concerned with the fact that they can't just do onClick={() => post.markAsPromoted()}, but instead, must do onClick={() => db.write(() => post.markAsPromoted())}. In my mind, while I dislike subActions for creating visual clutter, if writer/reader blocks are necessary (and they are because of synchronicity), I prefer to concentrate as much complexity related to databases as possible inside model/database code, and callers in the UI part of an app should not have to know too many details about it.

So, basically, the need for subAction is related to the use of @action decorators and avoid things blowing up?! Someone who doesn't uses @action (I don't) won't need it?!

radex commented 3 years ago

the idea, in this case, is to have each query as its own promise, making it a bit more transparent to the developer on how things run (aka we would just strip off the concept of the action).

I don't think you're grasping the concept of actions/writers. The internal implementation detail of queueing work for the database is not at all a problem. The problem is of mutual exclusion - making sure that only one thing is modifying the database at the same time.

Here's a bit from documentation:

  Consider a function `markCommentsAsSpam` that fetches a list of comments on a post, and then marks them all as spam. The two operations (fetching, and then updating) are asynchronous, and some other operation that modifies the database could run in between. And it could just happen to be a function that adds a new comment on this post. Even though the function completes *successfully*, it wasn't *actually* successful at its job.

  This example is trivial. But others may be far more dangerous. If a function fetches a record to perform an update on, this very record could be deleted midway through, making the action fail (and potentially causing the app to crash, if not handled properly). Or a function could have invariants determining whether the user is allowed to perform an action, that would be invalidated during action's execution. Or, in a collaborative app where access permissions are represented by another object, parallel execution of different actions could cause those access relations to be left in an inconsistent state.

  The worst part is that analyzing all *possible* interactions for dangers is very hard, and having sync that runs automatically makes them very likely.

  Solution? Group together related reads and writes together in an Writer, enforce that all writes MUST occur in a Writer, and only allow one Writer to run at the time. This way, it's guaranteed that in a Writer, you're looking at a consistent view of the world. Most simple reads are safe to do without groupping them, however if you have multiple related reads, you also need to wrap them in a Reader.

here's another example, pseudo-code simplified from sync:

applyChangesFromServer = async records => {
  // 1:
  const ids = records.map(r => r.id)
  const currentRecords = await db....query(Q.where('id', Q.oneOf(ids))

  // 2:
  await db.batch(
    ...records.created.map(record => collection.prepareCreate(record)),
    ...records.updated.map(record => currentRecords.find(r => r.id === record.id).resolveConflictsWith(record)),
  )
}

now imagine that while this is starting to execute, a user clicks on something, that, say, deletes one of the records. If this happens after sync's (1) has been called, but before it asynchronously returns, then we'll have inconsistent data, and (2) might try to update a record that no longer exists.

if you're familiar with concurrent programming principles, this is exactly the same problem. asynchronicity is similar to concurrency, even though only one thread is involved, and to maintain consistency in data access, we need to have mutual exclusion of writes. but because locking and unlocking a mutex is very error-prone and inconvenient, we abstract it away by serializing access - making a queue you access with a special method

The point is that as long as WatermelonDB is asynchronous (and there are pretty good reasons to want to keep it asynchronous), multi-step actions that change the database must have an explicit Writer block. There's no way around it. Either all the APIs are synchronous, there's an explicit action/writer API, or Watermelon only "works" most of the time, but then breaks and crashes in utterly undebuggable ways.

And a user doesn't really have to know or understand this fully, it only has to make sense to them that whenever they want to do some work that modifes the database, they need to wrap it in database.write(() => ...)

radex commented 3 years ago

my problem here is that post.markAsPromoted() is inside of action already... so why not just run it?!

I'm eagerly waiting for suggestions, because I hate this too, but again:

  1. we must enforce that there can only be one thing that modifies the database
  2. if it makes sense to have a method on Post that, when called, marks the post as promoted, we have to wrap its contents in a writer block, either by using database.write or by using @writer (@writer is just a convenience)
  3. if there's another writer that, as part of its work, wants to reuse existing code that is also a writer, we have a problem - because of (1) we can't just run it (then it would be pointless as we wouldn't achieve mutual exclusion), so we deadlock. To get out of the deadlock, we must explicitly tell Watermelon that we want the called writer to proceed and not queue.

As far as I can see, either we have to have something like subAction/callWriter, or (as stated before), we have to get rid of the @writer decorator and move the entire responsibility of wrapping work in database.write from functions that do database work to callers of these functions.

I'd absolutely love to be wrong and discover that there's a third option!

radex commented 3 years ago

BTW:

I implemented the first draft of the changes I suggested here: https://github.com/Nozbe/WatermelonDB/pull/1031

I thought about @michalpopek's proposal to call writer nesting writeMore but after thinking about it some more, I decided to stick to callWriter - it's a bland, technical, but hopefully unambiguous name. I wasn't sure how writeMore would be interpreted by a new user.

I have second thoughts about whether we want to have "Readers" in addition to "Writers". Throughout both Watermelon's and NozbeTeams' codebases, I found only a few cases where mutual exclusion is necessary but writing isn't. Having "Readers" in addition to "Writers" makes for kinda nice symmetry syntactically, but there's little practical benefit to the distinction. Using a Reader just protects you from unintentionally modifying the database, and in theory multiple Readers could run simultaneously (but there's probably little benefit from that). The downside is that there's another concept and the diagnostic warnings and documentation that have to keep repeating "writer or reader". We could just have Writers, and explain to the (very) advanced users that have a use case that requires consistent view of the data to wrap the reads in a Writer. OTOH, I already implemented it :P @michalpopek thoughts?

sidferreira commented 3 years ago

I don't think you're grasping the concept of actions/writers. The internal implementation detail of queueing work for the database is not at all a problem. The problem is of mutual exclusion - making sure that only one thing is modifying the database at the same time.

I totally agree with that. I'm sorry if I'm cluttering the discussion.

Async Thread 1 Async Thread 2
Step 1 started fetching changes
Step 2 const list = posts.fetchAll()
Step 3 Apply changes (adding some posts, removing others)
step 4 list.map(item => item.markAsRead())

It seems that the problem is a scenario like this one: Step 3 may add items that are not in list or remove existing items, maybe causing an exception, but probably keeping some items as unread.

The actions sounds like worker threads (actually it is called "worker" in the code), and if I'm right, I would suggest worker.

The main issue for that seems to be:

And now I finally got the idea of the action: groups database interactions together to avoid other interactions to modify the data currently in use. In some ways, acting like a transaction (without the rollback feature).

Assuming I got everything right, I would suggest:

a) everything is automatically added to an action. If you use: database.collection.get('posts').query().fetch() it will automatically create an action/worker/whatever. b) you still can use "actions" for complex scenarios:

database.getAction(async (action) => {
  const posts = database.collections.get('posts')
  const post = await posts.create(/* configure Post here */, { action })
  await post.markAsPromoted({action})
  await post.appendToBody(this.body, {action})
  return post
})

runWorker = () => {
  const currentWorker = ...;
  currentWorker.start();
  callbackPromise(currentWorker).finally(() => currentWorker.done());
}

In short, I look for making Watermelon as simple as possible for most uses, but keeping options for more complex scenarios.

I hope it make sense

sidferreira commented 3 years ago

Also, worth mentioning:

https://github.com/ujjwalguptaofficial/sqlweb https://jsstore.net

(just found them, not my use)

michalpopek commented 3 years ago

I decided to stick to callWriter - it's a bland, technical, but hopefully unambiguous name. I wasn't sure how writeMore would be interpreted by a new user.

Sure, sounds like a reasonable decision. We could also be even more explicit and extend the name to callChildWriter/callChildReader or something in that spirit, but I'm not sure, if it is necessary.

The downside is that there's another concept and the diagnostic warnings and documentation that have to keep repeating "writer or reader".

IMO the reader API makes the whole read/write concept complete and rules out any potential doubts that could arise, if we just introduced the writer API alone. From what I saw in your implementation PR, the reader API is already marked as an advanced topic, so I don't think it will bother newcomers that much. And the maintainance cost of being aware of the API when editing docs or writing warnings sounds relatively low for me.

We could just have Writers, and explain to the (very) advanced users that have a use case that requires consistent view of the data to wrap the reads in a Writer.

We could do that, but TBH it would feel like we're serving our users workarounds instead of providing them with a proper solution.

OTOH, I already implemented it :P

Yeah, and from what I saw it looks like the reader implementation isn't extensive, so it shouldn't create any serious debt