folbricht / desync

Alternative casync implementation
BSD 3-Clause "New" or "Revised" License
336 stars 44 forks source link

about performance issue #13

Closed limbo127 closed 6 years ago

limbo127 commented 6 years ago

Hello, i continue our tests about desync, and now in term of performance in comparaison of our actual solution to deploy our images. Tests shows us that read image block from qemu point of view is more slowy with desync than nbd for example. This is a normal behaviour, but we would want to known where , in your experience, we can improve desync performance, for our usage

1 - Is http/http2 too bad as data small block transfert, should we try to dev a " nbd chunk server" ? ( with -c localdisk with mount-index, performance is better than with -s httplocalstore , so the http protocol header, latence is the issue in this case ) 2 - desync mount-index and chunk-server seems to create 10 theads by default, is it necessary ? ( reduce cpu usage ) 3 - is desync is optimized to find chunk for very big store ? ( 500 G of ~ 64k chunk size : default ) ? 4 - other ?

Regards, Nicolas

charles-dyfis-net commented 6 years ago

@folbricht, ...what are your thoughts on setting up a mailing list, so we stop getting things that aren't bugs or feature requests filed as tickets?

This really isn't an appropriate general-discussion forum.

Perhaps gitter.im is the right answer here?

limbo127 commented 6 years ago

Hello, for your attention, test on 10G blob desync cat ~ 100 s desync mount-index + cat on fuse mount file : ~ 135 s do you think fuse mount can be optimize

*I do not test extract because of the write part of file.

folbricht commented 6 years ago

To first answer your original questions.

1) HTTP & HTTP/2 does add a certain amount of overhead for requests, encoding etc, it's always going to be slower than a local store.

2) desync mount-index should only create goroutines if used with an SSH store with -n > 1, but there's also a few goroutines used in the fuse library which I can't do anything about.

3) The size of the chunk store doesn't really have any impact, the chunks are indexed with the first 4 chars after all, so it should take the same time to get a chunk from a large store as it takes from a small one.

The issue with using mount-index is the serial nature of it, since desync can't predict which block is going to be read next it can't really be done in parallel. If we knew beforehand that the whole file will be read sequentially, there are a few things we could do but it's too specific a use-case. Every bit of additional latency is going to slow down this serialized process (as you noticed yourself). That's why HTTP stores are slower vs local stores, and why the FUSE mount is slower than the cat command (which is serial too but without the FUSE overhead).

For further questions, I suggest you ask in https://gitter.im/desync-casync-client/Lobby as Github issues aren't the right medium to discuss these things.

limbo127 commented 6 years ago

ok. thanks