gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.65k stars 1.08k forks source link

[RFE] Unified performance xlator #218

Open amarts opened 7 years ago

amarts commented 7 years ago

Today gluster has multiple performance translators. It is good because each one of these achieve just 1 tasks each, and most of the times does a good job of doing it.

Now, in most technical case, it helps to the user to have it implemented in one layer. Main reason being it will reduce many memory usage issues, and also there will be much better cache-coherency which we can achieve by doing it.

I am for maintaining the current xlator stack too, so users/developers have flexibility to use just 1 feature at a time, but this new xlator can solve many performance issues we face.

@poornimag @soumyakoduri @raghavendrahg @csabahenk

rafikc30 commented 7 years ago

We need to think about performance xlator like readdir-ahead, which is loaded below dht. I'm interested on this

poornimag commented 7 years ago

Even if we have different xlators, the cache can be common per inode, just a thought.

amarts commented 7 years ago

I am not just looking at duplication of cache, but more control on correctness. If all of these are together, we can take better decision on how to handle the cache.

raghavendrahg commented 7 years ago

I am not just looking at duplication of cache, but more control on correctness.

I am not sure whether we gain on the correctness attribute over existing model. The existing model treats backend data as the source of truth and always keeps cache updated with that.

However, unifying might help in performance. Like if there is a read after write, current model reaches to backend for correct data (unlike kernel page-cache, which writes to page and serves read from same page). But, the question is, whether treating backend as a source of truth is beneficial than treating unified cache as source of truth in a model where unified cache is on client side and there are multiple clients accessing the fs. Note that with multiple clients there are multiple unified caches and they might not be in sync. However, similar arguement can be made for non-unified cache too (a read on client1 won't flush the write cached in write-behind of client2).

xhernandez commented 6 years ago

@raghavendrahg unifying caches in a way that makes it possible for any xlator to use it would have a big advantage in some cases (for example it could be used by EC to avoid some read-modify-write cycles). The cache coherency needs to be addressed by using leases or some other backward notification (upcall) to invalidate things when modified by other clients. This is not easy to implement, but if we unify all caches in a single place, we only need to implement it once. And if done robustly, we can take full advantage of it, avoiding a lot of network requests.

Adding the concept of exclusive and shared access requests may help in reducing cache invalidations when not really needed.

It's not exactly the same thing, but modern CPUs use local private caches that are used for reads and writes without accessing the main memory (once cached). There exists protocols and special instructions to guarantee that all used data is up to date. We need to use something similar to improve performance on gluster.

ShyamsundarR commented 6 years ago

unifying caches in a way that makes it possible for any xlator to use it would have a big advantage

This is key as I see it. The ability for any xlator to access the cache, or update it as the case may be, can help in a few cases. Current xlator model for caches, does not allow such access into the cache, and as we design a unified cache, we would/should address this need, and make this a non-xlator(?) hence.

Further, I can see this as a set of methods hanging off the inode for an object, thus providing the required access to use and update the cache for an inode as appropriate.

If such a model is evolved, then leveraging leases, or detecting single client access facts, can largely help across xlators that have a need to identify the same.

rafikc30 commented 6 years ago

It will be easier to provide more information about the size of cache that gluster is using and to provide a way to flush it.

jdarcy commented 6 years ago

On Thu, May 25, 2017, at 01:59 AM, Raghavendra G wrote:

I am not sure whether we gain on the correctness attribute over existing model. The existing model treats backend data as the source of truth and always keeps cache updated with that.> However, unifying might help in performance. Like if there is a read after write, current model reaches to backend for correct data (unlike kernel page-cache, which writes to page and serves read from same page). But, the question is, whether treating backend as a source of truth is beneficial than treating unified cache as source of truth in a model where unified cache is on client side and there are multiple clients accessing the fs. Note that with multiple clients there are multiple unified caches and they might not be in sync. However, similar arguement can be made for non-unified cache too (a read on client1 won't flush the write cached in write-behind of client2). I think we're talking about two different issues here. The algorithm issue of caching more aggressively on clients (which requires mechanisms to ensure consistency) is mostly separate from the implementation issue of each translator caching separately vs. a unified cache. We can make either change without the other. However, the unified-cache model makes it easier to implement aggressive caching correctly because consistency stuff is hard. Having to deal with the same complex issues, including failures, over and over again would be a nightmare. Even if each implementation is correct in itself, the fact that they might be slightly different would result in many more combined states to consider for verification or debugging. We could implement it, but we could never trust it. By centralizing that logic into one module (used by many translators) we only need to solve those problems and verify/debug that solution one way, one time. My vote would be to implement a unified caching framework first in a way that mostly preserves existing behavior, test the heck out of it, maybe let it "settle" for a release cycle, then start incrementally adding changes to cache more.

amarts commented 6 years ago

I agree with @jdarcy above.

Lets prioritize getting all the different implementations in one place, and then look into other improvements for cache.

ShyamsundarR commented 6 years ago

I would like to hear more about how we would achieve "makes it possible for any xlator to use", which I also agree with as posted in (comment)[https://github.com/gluster/glusterfs/issues/218#issuecomment-332526909] OR is this not a criteria?

As an example, in RIO (issue #243) it would be beneficial to know that a single client is acting on a file (from an IO perspective), and thus not bother with a multitude of problems decreasing the runtime algorithmic complexity and also improving runtime IO performance reducing some network FOPs. Which means RIO client xlator (in this case) being able to query and understand that a file lease is held by this instance of the client.

poornimag commented 6 years ago

My understanding so far: Retaining the existing xlators as is, but having a common location, one copy, where the cache resides. Another inode_ctx 3 probably? with granular locking for each type of cache. The advantages include:

  1. Cache consistency in single client use case
  2. An infra for, common info on cache and lease, that can be leveraged by other xlators
  3. Deduplication of cache
  4. Helps in easy management of cache wrt, global cache size, pruning etc.

Different kinds of caches:

As mentioned, the consistency mechanisms (across multiple clients -leases, upcall) and aggressive caching algorithms are further enhancements, that can be added irrespective of the cache unification.

xhernandez commented 6 years ago

One thing to consider is that different xlators might need different data in the caches. Some xlators modify data and/or metadata that passes through them. This means that if the cache is at an upper level and a lower level xlator needs to access cached data, it won't get what it desires if the cache is common to all xlators.

For example, suppose a volume with encryption enabled and using ec. The order of the xlators will be:

    io-cache
       |
     crypt
       |
      ec

In this case data cached by io-cache will be unencrypted. However, in some cases ec needs to read data to do some unaligned writes. It would be very interesting to have this data cached, but io-cache won't be able to provide it because ec needs encrypted data (as ec is working below crypt). Even if we place cache in a common place where all xlators could query it, ec wouldn't benefit in this case.

I'm sure there are other cases similar for data and metadata.

I think that each xlator should be able to add data to the cache, otherwise we are limiting the possible advantages of caching.

jdarcy commented 6 years ago

That's a good point, Xavier. Definitely the caching mechanisms should be implemented once as a library instead of many times. Should the cached data be shared between translators? Often yes, but sometimes - as in your example - no. One way to think of this is that there may exist many separate namespaces within the cache module (kind of like we do for locks). If two or more translators can share data, then they use the same namespace. If they need something more private, they use a private namespace. Make sense?

xhernandez commented 6 years ago

I think the namespace approach is not valid in this case because each xlator doesn't have enough knowledge to determine if its cached data should be in the same namespace of another xlator or not. What makes data shareable between xlators doesn't depend on the xlators themselves but on the intermediate ones (if they modify data/metadata or not).

One alternative would be to have some kind of "generation" number attached to each data block or metadata. If some xlator modifies the data, it should increase (requests) or decrease (answers) this number. When data is added or requested from the cache, this number will determine if data can be shared or not.

amarts commented 6 years ago

However, unifying might help in performance. Like if there is a read after write, current model reaches to backend for correct data (unlike kernel page-cache, which writes to page and serves read from same page).

@raghavendrahg this is the very reason we want this feature. Specially when the network latency is high (think cloud), this would benefit the users most.

But, the question is, whether treating backend as a source of truth is beneficial than treating unified cache as source of truth in a model where unified cache is on client side and there are multiple clients accessing the fs. Note that with multiple clients there are multiple unified caches and they might not be in sync.

We need metrics to see how many times we are hitting this condition, ie, same file being accessed from more than 1 client. Because my guess is less than 99% of the time. We should handle the case of multiple clients accessing the file, but that can be dealt with upcall now in an easier manner.

Anyways, before writing any code, I guess we need a clear metrics on each translator if they are doing what is expected out of them. Instead of looking at the 'application' performance over mount point only, we should also get insight into all the internal metrics which we used to make decisions in each translators.

xhernandez commented 6 years ago

Shyam and I have created an initial document with a high level approach to the caching solution and some other related things to start discussing its design. Feel free to provide comments in the document.

atinmu commented 6 years ago

Seems like this doc doesn’t have access rights to everyone which includes me.

On Tue, 25 Sep 2018 at 18:37, Xavi Hernandez notifications@github.com wrote:

Shyam and I have created an initial document https://docs.google.com/document/d/11dArAfe4KJscE7aH_ybefKtqiLmZY7txIjn-3ZGywj4/edit?usp=sharing with a high level approach to the caching solution and some other related things to start discussing its design. Feel free to provide comments in the document.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gluster/glusterfs/issues/218#issuecomment-424335261, or mute the thread https://github.com/notifications/unsubscribe-auth/AGp7mJ9AJGi-tR0oSc6aDIk7sbJbquwvks5ueip0gaJpZM4Nk4eU .

--

  • Atin (atinm)
xhernandez commented 6 years ago

It should be working now. I edited the message to modify the link. Make sure you refresh the cache to get the new link.

xhernandez commented 4 years ago

Given that the proposed solution is very ambitious, I think we could start by implementing a simpler caching layer but with consistency guarantees so that we can actively use it for authoritative replies. As a first approach, we could create a new xlator that uses an inodelk to guarantee that cached data is not modified from elsewhere. We can use contention notifications to detect when another client tries to access the data. In this case the current owner can flush all pending data and release the lock.

amarts commented 4 years ago

@xhernandez I am in full support of this improvement. Starting something small always gives results quicker, and lets us know if we are in correct path.

stale[bot] commented 4 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

xhernandez commented 4 years ago

This is being worked on

xhernandez commented 3 years ago

The current implementation of this feature will provide consistent access to data by properly synchronizing multiple clients accessing the same data. However, to be able to fully serve requests entirely from the cache, we'll need to implement several additional features. The most relevant ones are:

Without this, the cache could reply to an open() request as a success, but in fact it would fail once executed on the backend if permissions are not valid, for example (note that this is the same that currently happens with open-behind).

These additional features to improve caching capabilities and performance will be incrementally added.

JiandongLu commented 3 years ago

The current implementation of this feature will provide consistent access to data by properly synchronizing multiple clients accessing the same data. However, to be able to fully serve requests entirely from the cache, we'll need to implement several additional features. The most relevant ones are:

  • Posix rights and ACL management
  • Posix locks management

Without this, the cache could reply to an open() request as a success, but in fact it would fail once executed on the backend if permissions are not valid, for example (note that this is the same that currently happens with open-behind).

These additional features to improve caching capabilities and performance will be incrementally added.

I'm sorry that I did not find the source code in https://github.com/gluster/glusterfs/archive/v9.0.zip or https://github.com/gluster/glusterfs/archive/v10dev.zip are the codes open sourced ?

xhernandez commented 3 years ago

@JiandongLu the feature couldn't be implemented for release 9. It will be present in next release.

JiandongLu commented 3 years ago

@xhernandez the feature couldn't be implemented for release 9. It will be present in next release.

thanks.

there may two possibilities: 1、there is a cache in each (glusterfs) client node. so each client has its own cache 2、in there glusterfs cluster, there is only one, global cache, and each client (directly or indirectly) accesses the global cache. which one is the case ?

xhernandez commented 3 years ago

The cache will be present on each client. The second approach creates a SPOF and requires more work to keep it up to date and healthy. It also requires a network round-trip, while the local client cache doesn't need network at all when the data is present.

stale[bot] commented 2 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 1 year ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 1 year ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.