Closed rileylark closed 7 years ago
It's a requirement for all our users and of course something that deepstream takes care of. Generally, Roland is right when saying "Clients are never notified of write success." - but that doesn't mean there are no write guarantees. Deepstream employs an eventual consistency model with negative acknowledgement. Every write / record state is assigned a version number. If a version number is skipped, the client automatically reconciliates. If two versions clash a merge function is triggered. Errors while writing to either the cache or the database are relayed back to the client that executed the write. If the deepstream-server goes down in the exact time a write is underway and later reconnects the version conflict will be detected.
Ok, this sounds great! I don't understand this particular case yet, though:
Deepstream starts. Failed writes are not persisted to database however can still be read from cache if client requests it. If Redis dies or restarts the data is lost.
At what point is the client notified that the data is lost?
Good question: Redis saves its data to disk as well and thus can serve as a secondary backup to the database. Other caches like Memcached or Hazelcast however are in-memory only and loose data as soon as they shut down.
If the write to the cache succeeds, but the write to the database fails the client will receive either a RECORD_CREATE_ERROR
for new records or a RECORD_UPDATE_ERROR
for existing records.
If deepstream fails the exact millisecond after a write to the cache, but during a write to the database that also fails, this data will be in cache only and thus an inconsistency will occur.
Ok, so it's that inconsistency I'm trying to narrow in on - the case where DS successfully writes to the cache and then crashes, so now the cache & database disagree.
Am I overreacting? Is this an insignificant window for failure?
Maybe, for our specific case, we could alter the persistence logic from:
to
RECORD_UDPATE_ERROR
)RECORD_UDPATE_ERROR
)
2b. Message guaranteed to be delivered to database (if this fails send RECORD_UDPATE_ERROR
)For us, adding 50ms of latency would be a fine exchange for a guarantee that we would be notified of inconsistencies.
Maybe I'm misunderstanding the exact scenario. In your description above, the deepstream server shuts down the exact moment between a write to cache and a write to the database - without deepstream, how would these various errors/ guaranteed writes be sent?
It's probably also worth mentioning that the cache solely exists to improve speed. The cache and database connector interfaces are interchangeable - if you wouldn't mind a slightly higher latency (as in the additional amount of time it takes to write to / read from Mongo or Postgres as opposed to Redis or Memcached) you could simply use deepstream with a database connector as cache connector and leave the database connector disabled.
Does deepstream cluster provide redundancy or only horizontal scale for performance? If one server shutdown before it writes to database, will another server in the cluster complete the pending work if the shutdown server?
Does deepstream cluster provide redundancy or only horizontal scale for performance? If one server shutdown before it writes to database, will another server in the cluster complete the pending work if the shutdown server?
I believe the answer is that there is no sense of "pending work" and the data will just be lost / inconsistent.
What I want is a guarantee that clients will never be sent data that's permanently inconsistent with the database. I'm fine with eventual consistency, but I can't have the case where my cache is cleared and suddenly new reads are getting different data than clients that have old data from the old pre-clear cache. That state of inconsistency sounds so scary to me because we'd never be able to debug it and would probably not even notice it.
Firebase sends acknowledgements of writes back to clients, which is nice for two reasons:
In your description above, the deepstream server shuts down the exact moment between a write to cache and a write to the database - without deepstream, how would these various errors/ guaranteed writes be sent?
By writing all requests to pubsub before the cache or the database, you could reduce the requests to a single atomic operation: post to pubsub
. From there if either writing to cache or database failed, pubsub could retry. After x retries some process can send RECORD_UPDATE_ERROR
(and maybe clear the cache?)
(But still, in the absence of client notification of success, the client would have no way of knowing the write failed if DS fails before writing to pubsub)
@rileylark I'm highly committed to making deepstream work for you and I understand your concern. I know that the problem described whilst theoretically possible doesn't exist in practice as it would require both the deepstream-server and the database to crash in a precise sequence after receiving a message and after storing it to the cache yet before the database - which is also failing - responds. As you said, using "pub-sub" or a buffered message queue would only move this theoretical scenario forward to "what happens if deepstream fails the exact moment before pushing a message on the queue"
deepstream takes two steps to prevent inconsistencies.
Having said that, I am starting to accept that I am being too stubborn. I know that it works and that inconsistencies are both extremely rare and eventually reconciled, creating a self-healing system which is significantly more than most storage solutions provide by themselves. However I do acknowledge how uneasy this makes a large part of our user-base and at which great length some have gone to implement something that resembles per-write acknowledgements. I'd therefor suggest I'll discuss with the team to add them as a feature, disabled by default. Going forward, there would be two new aspects. On the server a new config option
confirmWrites: true
on the client a callback
myRecord.set( 'somePath', 'someValue', error => {/*null for successful writes */});
Curious about your thoughts
Why have a setting at all? If a callback is provided then confirm otherwise don't...
Write acknowledgement is not a client operation. The server has to send out an ack message, effectively doubling traffic for write heavy apps and potentially blocking update/patch messages in a different sequence. This will slow the system down by multiples of its current performance and is unnecessary forr a vast number of usecases, especially with one-directional dataflows (price grids, trading, realtime dashboards etc.), hence the explicit option.
If what you're hinting at is to make it dynamic by adding a confirm flag to the update/patch message and do it on a per transaction basis, that's an interesting idea though. It will however introduce a lot of complexity, especially if multiple clients write to the same record, some requiring confirmation, some don't
I'm hinting at a flag.
Wow, this response is amazing! I hope I do not come off as pushy on this - I really appreciate deepstream even without client acknowledgment! But, yes, client acknowledgement would be a requirement for me to use DS in production.
Please let me know if you'd like to talk more about my reliability requirements or use cases! FYI, we have an app deployed to schools that currently uses Firebase to communicate between teachers and students. In this case the current DS behavior would not be "self-healing" - if a student writes an answer to a test today, and then tomorrow the teacher sees no response (or a different response), that's not something the student can quickly correct. I hear you saying that this will probably never happen, and I agree with you, but when a teacher comes to me saying "hey my students' responses aren't showing up," I don't want to even have this possibility on my mental landscape while I'm trying to understand the problem.
Specific feedback:
I think the current behavior can lose data and become inconsistent. I am super-ok with things being out of sync for awhile but the possibility of the client, cache, and database permanently disagreeing is what is unacceptable. I cannot depend on there being one last write that sets the record straight everywhere.
A config option would be great. I'd set it to true all the time and I would not want it to change if I don't pass a callback. Even if I don't set an error callback, I certainly want my clients notified if some data didn't get updated. Firebase handles this by acting as if the data has changed back to its previous value, which is fine for me - the student won't think their response has been submitted because it would flick back to being blank.
e.g.:
myRecord.set( 'somePath', 'someValue') // no error callback provided
'someValue'
If this was how it worked then I could live a care-free life setting data on the server, and never polling to see if I have up-to-date data.
I do also love the acknowledgement callback with error parameter. In our app we have some quickly-updating values that we don't really care if some of the set
operations fail in the middle, but we have other values like user preferences where it's really important to be sure they are permanently persisted before optimistically updating the UI. An optional callback lets us choose whether or not to be optimistic very easily. Note: if the presence of the callback changed the reliability of the set
then we would probably pass a no-op callback to every single set
call.
the very next write would detect the broken version chain and reconcile the error
I see what you're saying and this is often how we use firebase, but we need to be guaranteed that the last write is permanently reconciled (eventually) as well.
This has been amazing! Please take all this with a grain of salt - I would like to be involved even if you do not prioritize my concerns. Also, please let me know if this is something you would be interested in my help with. Our company is currently dropping $10k/mo on Firebase and so it makes some financial sense for us to lend a hand with this, as well as the super-fun of working on such an exciting project. If you'd like to talk more about how we can help, I'd be happy to jump on a call.
Thanks again!
I think this could be solved in a simple (?) and performant manner using fencing, e.g.
<<FENCE_START 0
<<UPDATE 1
<<UPDATE 2
<<UPDATE 3
>>ERROR 1
<<FENCE_END 0
>>ERROR 2
>>FENCE_ACK 0
i.e. the client batches and surrounds UPDATE
messages with FENCE_START
& FENCE_END
. After FENCE_END
the server sends any failures followed by a FENCE_ACK
. If no errors are received before FENCE_END
it can assume that everything is ok and persisted to storage. Of course this would only happen if any of the updates have a corresponding error listener.
This would of course mean that the result callback will not be called as fast a possible, but with some small latency, which I believe is ok in the cases that actually require confirmation.
Of course this would only happen if any of the updates have a corresponding error listener.
FWIW, our team would not get any value out of the per-write preference for acknowledgement - we'd rather be confident that every write was successful.
Re. batching acknowledgements: a delay of under 50ms would be fine for our use case and I think you are right that any UI waiting for an acknowledgement probably wouldn't mind a small delay. OTOH the bandwidth saved seems small and if batching acknowledgements requires additional complexity on the server it might not be worth it. Bandwidth is cheaper than RAM for us.
we'd rather be confident that every write was successful.
Then make sure to provide an error listener with every write...
bandwidth saved seems small
Actually, it's not entirely negligible in many cases... I think an all or nothing solution is not a step forwards...
I think there's a bit of a misunderstanding: We already notify clients if a write failed, the feature we'll add is an explicit confirmation that it succeeded. Therefor (I'm aware that I originally suggested them) explicit "always on" confirmations aren't needed as this is already covered.
For technical reasons fencing / commit blocks from the client don't make too much sense. Multiple clients write to the same record and the server already maintains short-lifed update queues to deal with that. The write confirmation will be send as soon as the current update queue completes successfully (most of the time this queue will only contain a single update, only at high write concurrency there'll be more)
Imagine if you read a section of the postgres docs and it said in small type, "by the way, if you don't check the return value of the insert command, we don't guarantee the insert succeeded." Or imagine if you were saving a file on your hard drive, but unless you checked to see that it was there, there's a chance the file would disappear after you restart - it was saved in the cache but not in the DB.
I think there's a bit of a misunderstanding: We already notify clients if a write failed, the feature we'll add is an explicit confirmation that it succeeded.
Yes, I don't understand this. Writes can fail without the clients being notified! That's the whole reason I posted this issue.
Not only can writes fail after getting to DS, as we have discussed, but there are many other scenarios where a message can leave a client and never even get to DS. My product is deployed inside school buildings, which have the craziest sets of browser extensions and network filter rules you can imagine. In the case of a network failure after a write, the client will think the write has succeeded even though it failed, and DS will never notify it otherwise.
Then make sure to provide an error listener with every write...
👍 will do - again, I love DS in its current form and we have the resources to make it work like we need it to! Thank you for all your work!
We do provide mechanisms to trace write errors and with the future release we'll provide callbacks for guaranteed writes fully enabling your usecase / fulfilling the requirements of a project management app. Please don't underestimate though that deepstream is a very broad church that caters for a lot of usecases where write consistency is less - or no issue at all such as multiplayer gaming, financial trading or realtime dashboards.
Great! Let me know if there's anything we can do to help, either with this issue or with another area! The enthusiasm for deepstream is spreading across our office :D
@rileylark We are looking to release this functionality next week Friday as part of 2.1, we'll give you a heads up as soon as it's RC for feedback.
There are a few small quick wins we want to first get out in the next few days, mainly feedback we got from the large amount of clients the community is writing.
In regards to help, we have a waffle board for the current roadmap and bugs. If your using a specific front-end tech like react, vue or something else would be interesting to see how and if we can create any plugins to make things integrate better.
The best help though would be if you could write a blogpost about how you use deepstream, your use case is quite interesting and it would be great to spread the word!
@rileylark This has been released as part of 2.1. Please find the updated documentation here
Let's assume there is a record {"firstname": "Homer"}
... when running the following code, nothing appears in the console:
record.set('firstname', 'Homer', err => {
if (err) {
console.log('Record set with error:', err)
} else {
console.log('Record set without error')
}
})
So the callback is not executed, is this expected behaviour?
@snc this won't be written to cache/storage as you're not making any changes to the data. This occurs with or without write acknowledgement. The example on the website could take this into account though and we'll get that changed
OK thanks for clarification.
In https://github.com/deepstreamIO/deepstream.io/issues/195#issuecomment-263007723 @ronag mentions that
For our use case it's important that clients can depend on their writes succeeding. It's fine for us if "succeeding" only means "will definitely be permanently stored at some point" but it's not OK for us to assume writes have succeeded when they may not.
Is deepstream still a match for our case? How hard would it be for us to add behavior like notifying clients when their writes have failed to make it into durable storage?