cloudius-systems / osv

OSv, a new operating system for the cloud.
osv.io
Other
4.05k stars 602 forks source link

VFS: I think lseek() shouldn't really be a VFS operation. #451

Open nyh opened 9 years ago

nyh commented 9 years ago

lseek() is a per-fd, not per-vnode operation. Yet, our sys_lseek() (inherited from Prex, I believe) calls a vnode-specific VOP_SEEK, and holds the vnode lock while doing it.

If we look at implementations of these VOP_SEEK we see that zfs_seek() does some trivial checking and ramfs_seek is a null op. I think we should get rid of this VOP_SEEK. (TODO: check what Linux's VFS does regarding seek).

Currently, we're seeing a slowdown in Cassandra because several threads are doing lseek()/read() concurrently. But I think the problem is the long read() blocking (see issue #450), not the vnode lock in sys_lseek() which should be fairly harmless because it is held for such short durations. I'm not sure whether, even with the change suggested here, we can avoid the (very short-term) vnode lock in sys_lseek(): We may need this lock if we find ourselves needing to check the validity of the file type for example. But in most cases, it is better to avoid the vnode layer at all. E.g.,., I think after someone opens "/dev/console", we don't need to remember this name - the fd can have all the file-operations of the console device - it doesn't need the vfs file-operations and doesn't need "vnode"

slivne commented 9 years ago

@raphaelsc did we update the seek operation ?

raphaelsc commented 9 years ago

On Mon, Sep 29, 2014 at 5:08 AM, slivne notifications@github.com wrote:

@raphaelsc https://github.com/raphaelsc did we update the seek operation ?

No, but it shouldn't be a performance issue anymore given that seek will proceed quickly when there is a concurrent read/write on the same node.

— Reply to this email directly or view it on GitHub https://github.com/cloudius-systems/osv/issues/451#issuecomment-57128770 .

Raphael S. Carvalho

slivne commented 9 years ago

@raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

raphaelsc commented 9 years ago

On Mon, Sep 29, 2014 at 9:29 AM, slivne notifications@github.com wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore waiting for either read/write to release the node lock. It would be good to run that workload Gleb ran (that reproduced the contention on seek; I don't remember which one exactly), and see if the contention on seek is indeed gone.

— Reply to this email directly or view it on GitHub https://github.com/cloudius-systems/osv/issues/451#issuecomment-57153259 .

Raphael S. Carvalho

gleb-cloudius commented 9 years ago

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne notifications@github.com wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore waiting for either read/write to release the node lock. It would be good to run that workload Gleb ran (that reproduced the contention on seek; I don't remember which one exactly), and see if the contention on seek is indeed gone.

It was cassandra with compressed tables, but I am not sure how relevant all this now since the patch that removes locking created deadlock.

        Gleb.
raphaelsc commented 9 years ago

On Mon, Sep 29, 2014 at 9:51 AM, Gleb Natapov notifications@github.com wrote:

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne notifications@github.com wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore waiting for either read/write to release the node lock. It would be good to run that workload Gleb ran (that reproduced the contention on seek; I don't remember which one exactly), and see if the contention on seek is indeed gone.

It was cassandra with compressed tables, but I am not sure how relevant all this now since the patch that removes locking created deadlock.

Gleb, it happens when taking the vnode lock to update the vnode size when the file is being extended, on zfs_write. I'm trying to think about a solution. How about adding a function to get the node size from the file system instead? It's not the best of the solutions, but it would fix the problem well.

Gleb.

— Reply to this email directly or view it on GitHub https://github.com/cloudius-systems/osv/issues/451#issuecomment-57155525 .

Raphael S. Carvalho

gleb-cloudius commented 9 years ago

On Mon, Sep 29, 2014 at 05:54:52AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:51 AM, Gleb Natapov notifications@github.com wrote:

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne notifications@github.com wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore waiting for either read/write to release the node lock. It would be good to run that workload Gleb ran (that reproduced the contention on seek; I don't remember which one exactly), and see if the contention on seek is indeed gone.

It was cassandra with compressed tables, but I am not sure how relevant all this now since the patch that removes locking created deadlock.

Gleb, it happens when taking the vnode lock to update the vnode size when the file is being extended, on zfs_write. I'm trying to think about a solution. How about adding a function to get the node size from the file system instead? It's not the best of the solutions, but it would fix the problem well.

If this does not break anything else... You will have to implement this for all other FSes too.

        Gleb.