deepstreamIO / deepstream.io

deepstream.io server
https://deepstreamio.github.io
MIT License
7.13k stars 382 forks source link

List updates are not sent as deltas #1120

Closed romanzy313 closed 1 year ago

romanzy313 commented 2 years ago

While working with lists I've noticed that on every change the whole list is sent over the wire. Updates to other clients are also received whole.

I modify the list with addEntry and removeEntry. This is what I see in my browser Network tab, minus some binary around it.

['red', 'green']
['red', 'green', 'blue']

I was hoping to use deepstream to replace socketio and greatly simplify the real-time part of my application. Using a list to store records of messages for a chat room will soon slow down the application significantly, as more messages are added.

Is this a bug? The only Issue I could find is marked as resolved #106. I am using the latest version of the client and the server.

Thanks!

jaime-ez commented 2 years ago

Hi, I haven't checked this specifically, list.subscribe should send the complete list when it is updated. If you want to receive just the changes you should use list events: https://deepstreamio.github.io/docs/docs/client-js/datasync-list#events list.onEntryAdded should give you only the added element.

Which one are you using?

romanzy313 commented 2 years ago

Hello,

I have been using the subscribe method, and now have tried your suggested method. These different ways of "subscribing" are identical network-wise. The issue I am having is that the entire list is being sent through the network when making changes.

For example, some list is 10000 entries long. When calling list.addEntry, the client will upload an array of 10001 items, and then all other clients will download 10001 items. Not very scalable in my opinion. I have assumed that lists work the same way as records, but it does not seem like that. With records, executing record.set("some.nested.prop", 34) only sends partial update to the server.

You can find my implementation here

Here is a screenshot of the network tab when I do list.entryAdd and list.removeEntry one item at a time.

Screenshot_20220716_011142

jaime-ez commented 2 years ago

Hi, thanks for the complete reporting. Looking at the source code it seems you are right. Lists in fact are a wrapper over a record with some extra methods. Take a look here https://github.com/deepstreamIO/deepstream.io-client-js/blob/master/src/record/list.ts and as you say it is clear that for every operation the list entries are retrieved completely. I agree that is sub-optimal. Personally I don't use lists much and none has more than 100 items so I haven't had performance issues.

Feel free to comment on a possible implementation in order to optimize this behaviour and I'll be happy to discuss it. Unfortunately I don't have time to focus on this at the moment.

Best

romanzy313 commented 2 years ago

Since regular objects (records) can be sent in deltas, the quick and dirty fix would be to send entities as keys of an object, with empty values. Of course, that would completely break protocol and it is efficient. Maybe with compression, data payloads should be okay (uWebsockets is really good at compression)

This library is amazing in the way it architects a "serverless" database, where valve is the only configuration required. The server is fast, scalable, and well tested (e2e tests are amazing!) But I found some issues while integrating it with mobx.

It would be best to rewrite with something like immer. It would greatly simplify internals, expose better api for library consumer, and allow to send batched json patches of any data type. Of course it is a huge undertaking, and everyone needs to be on board with breaking changes right and left 😅

jaime-ez commented 2 years ago

I'm glad that you see all the plus sides this project has, it has tons of work behind and very goog features, that's why I've been using and maintaining it as much as possible, focusing on bugs mostly.

Regarding lists, I don't think using keys of an object for representing an array would be very efficient. However nothing stops you from using records that way if you want to.

An option could be to create a new custom list message that informs clients that value(s) at index(es) have changed, this means creating a new set of protocol messages for lists, which is what should have been done from the beginning I guess.

I agree with your comments in general, if you can give me some examples of:

And please comment on:

Yes using Immer would be great, but one of the core propositions of the project is to keep external dependencies to a bare minimum, and I adhere to it, for bundle size and project maintenability.

romanzy313 commented 2 years ago

Non-standard path implementation.

No patch implementation

Race conditions

Checked the docs, async/await is everywhere indeed.

Anyways, thank you for your time and for maintaining this project. I would like to help sorting these issues out, but I don't have spare time at the moment. Cheers!

jaime-ez commented 1 year ago

Hi, I was thinking about the lists issue, sending the complete list over the network when something changes, and I find it hard to improve. The deepstream protocol sends patches only when a path of the record is updated, so the only way to do it is to map an array to an object where the index is the key or where the entry is the key and it keeps an array of positions in the array. Either way, the problem is that if you remove or add an element, you must send as many patch messages as entries need to be moved, which due to the message overhead would amount to more network requests than sending the whole list in many cases. The only performance improvement is when one adds a new element to the list at the last index, but the overhead for all other uses cases (i.e. map the array into an object) makes it non-viable in my opinion.

jaime-ez commented 1 year ago

I'm gonna close this and reference it in discussions in order to keep in mind the other suggestions you made.

Feel free to reopen if required, best

jaime-ez commented 1 year ago

Hi @romanzy313, take a look at the latest release v7.0.2, I implemented a double-ended queue to address this issue. Hope you can take a look and give some feedback. Best