open-behind should be disabled by default

xhernandez commented 2 years ago

Description of problem:

When open-behind is enabled, bricks don't really see the open request until another request that requires the fd is sent by the user. At that time, open-behind first opens the file and then executes the related request.

This is bad when the file is deleted before the actual open.

For example, an application doing an open() followed by an fstat() will see an ENOENT or ESTALE error in fstat() if someone else has deleted the file between open() and fstat(), but this should never happen from the application point of view because a successful open should guarantee that the file is not gone.

When bricks receive a delete operation, they check if there are open file descriptors. If so, the file is removed from the directory but not physically deleted from the brick, so that other operations depending on the fd can work as expected. However, since open-behind "hides" the open file descriptor from the bricks, they think the file is not used anymore and delete it upon receiving the unlink request.

Additionally, the useful use cases for open-behind are very limited, so I would say we should have it disabled by default and only enable it if explicitly requested by the user.

Another possibility is to completely remove the open-behind xlator. It doesn't provide much benefit in many real use cases, and completely breaks posix compliance.

The exact command to reproduce the issue:

Create a volume
Mount it in 2 mount points M1 and M2
Create a file in M1
Open the file from M1
Delete the file from M2
Wait 5 seconds (to avoid using cached data)
fstat the file on M1

The last fstat will fail with ENOENT or ESTALE.

xhernandez commented 2 years ago

@amarts @pranithk @mohit84 what do you think ?

I would say that we could just remove open-behind, but at least we need to disable it by default.

amarts commented 2 years ago

the open file descriptor usecase is very common. My take is, we should disable open-behind by default (Actually I thought its disabled by default already).

mohit84 commented 2 years ago

@amarts @pranithk @mohit84 what do you think ?

I would say that we could just remove open-behind, but at least we need to disable it by default.

I think before disabling it we can generate the perf number also then we can take a decision.

xhernandez commented 2 years ago

the open file descriptor usecase is very common.

What's this common use case ? anything that requires almost any other operation after open, won't avoid the open, so the latency we are saving in the opening is being lost in the next operation.

If I'm not wrong, it's only useful for small-file read-only workloads when quick-read is also enabled and the size of the file is, at most, the size of the quick-read buffer (64 KiB).

However quick-read is based on lookup requests, which are intrinsically unsafe for data access because they are executed without locks nor any protection against concurrent writes. So I'm not really sure if the workload where open-behind could be useful is actually safe to use.

My take is, we should disable open-behind by default (Actually I thought its disabled by default already).

I also thought it was disabled because the "open-behind" option in the xlator is set to "off": https://github.com/gluster/glusterfs/blob/b147e3a2825b02ab983c089476638817b070ca88/xlators/performance/open-behind/src/open-behind.c#L1037.

However the option in glusterd-volume-set.c is enabled:

https://github.com/gluster/glusterfs/blob/b147e3a2825b02ab983c089476638817b070ca88/xlators/mgmt/glusterd/src/glusterd-volume-set.c#L1742

xhernandez commented 2 years ago

@amarts @pranithk @mohit84 what do you think ? I would say that we could just remove open-behind, but at least we need to disable it by default.

I think before disabling it we can generate the perf number also then we can take a decision.

I would say that's not an option when the behavior is not correct. Even if performance regresses, keeping open-behind enabled is not posix compliant.

However it could be interesting to see the differences.

xhernandez commented 2 years ago

the open file descriptor usecase is very common. My take is, we should disable open-behind by default (Actually I thought its disabled by default already).

Now that I re-read it, I think I misinterpreted your comment. The common use case you refer to is to access open file descriptors after having deleted the original file, right ?

amarts commented 2 years ago

Now that I re-read it, I think I misinterpreted your comment. The common use case you refer to is to access open file descriptors after having deleted the original file, right ?

yes. It is used in tests, it is used by some applications etc.

mohit84 commented 2 years ago

@amarts @pranithk @mohit84 what do you think ? I would say that we could just remove open-behind, but at least we need to disable it by default.

I think before disabling it we can generate the perf number also then we can take a decision.

I would say that's not an option when the behavior is not correct. Even if performance regresses, keeping open-behind enabled is not posix compliant.

However it could be interesting to see the differences.

We are seeing regression for read fop

read                 29017                    19692                     -32

xhernandez commented 2 years ago

We are seeing regression for read fop
read                 29017                    19692                     -32   

This is with 64 KiB files, right ? can it be tested with bigger files ?

It's certainly a significant regression. However with this option enabled applications may not work correctly. I think correct operation is top priority, even above performance.

As an alternative we may try to optimize this case in another way without breaking functionality. Some ideas that come to my mind:

Modify quick-read to prefetch first 64 KiB in the background just after open() instead of doing it in lookup (this also has the benefit of guaranteeing that the data is consistent).
Maybe read-ahead could also help, also prefetching some data after open.
Identify if there's some unneeded latency (like the readdir operation during rmdir in DHT fixed recently).
Rework the handling of inode's lifecycle to not depend on open fd's and so not needing the open call (this is a huge change but it will also fix many other consistency issues).

These options may not remove all latency, but the gap shouldn't be so big.

mohit84 commented 2 years ago

We are seeing regression for read fop
read                 29017                    19692                     -32   
This is with 64 KiB files, right ? can it be tested with bigger files ?

It's certainly a significant regression. However with this option enabled applications may not work correctly. I think correct operation is top priority, even above performance.

As an alternative we may try to optimize this case in another way without breaking functionality. Some ideas that come to my mind:

Modify quick-read to prefetch first 64 KiB in the background just after open() instead of doing it in lookup (this also has the benefit of guaranteeing that the data is consistent).

Maybe read-ahead could also help, also prefetching some data after open.

Identify if there's some unneeded latency (like the readdir operation during rmdir in DHT fixed recently).

Rework the handling of inode's lifecycle to not depend on open fd's and so not needing the open call (this is a huge change but it will also fix many other consistency issues).

These options may not remove all latency, but the gap shouldn't be so big.

Yes the file size is 64KB.

Ratio2 commented 1 year ago

Open-behind useful for second read same file. Fuse/kernel cache file data, md-cache cache metadata and open-behind remove big latency to server for fopen (unnecessary for cached data).

xhernandez commented 1 year ago

@Ratio2 the problem here is not so much the performance but the correct behavior. The current implementation of cache invalidation is not strict enough to allow caching data/metadata for long periods of time, so we need to flush caches quite frequently to minimize the risk of returning stale data/metadata that could generate inconsistencies. Without this, open-behind is of little help here.

There's a proposal (#3812) that should help avoid the problem described in the initial comment. Additionally, once we have it, consistency could be made more strict, allowing caching things for longer periods of time without having to suffer about consistency. But right now having open-behind enabled for some workloads is dangerous, so we can't keep it enabled by default. Only if the users are sure that their workload is safe with open-behind, they should manually enable the option if they really want.

stale[bot] commented 1 year ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

gluster / glusterfs

open-behind should be disabled by default #3785