seqid mismatch during CLOSE

sraillard commented 4 years ago

When multiple OPEN and CLOSE operations are interleaved, it's looking like the wrong seqid is used for the test: https://github.com/dCache/nfs4j/blob/b4af4c33051c3d150ed94cbb503730c977fee310/core/src/main/java/org/dcache/nfs/v4/OperationCLOSE.java#L58

Before this test, we added a log to check the different values: _log.info("[contextCurrentStateId = " + context.currentStateid() + " , _args.opclose.open_stateid = " + _args.opclose.open_stateid + " , nfsState.stateid() = " + nfsState.stateid() + " , stateid = " + stateid + "]");

The seqid returned by nfsState.stateid() seems to be the one from the last OPEN operation, so the test is failing.

Here is the capture and the log :

2020-11-23 22:32:53.007 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (6)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007a, seq: 1] , nfsState.stateid() = [5fbc217e000000020000007a, seq: 1] , stateid = [5fbc217e000000020000007a, seq: 1]]
2020-11-23 22:32:53.010 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (3)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007b, seq: 2] , nfsState.stateid() = [5fbc217e000000020000007b, seq: 3] , stateid = [5fbc217e000000020000007b, seq: 2]]
2020-11-23 22:32:53.011 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (10)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007b, seq: 5] , nfsState.stateid() = [5fbc217e000000020000007b, seq: 7] , stateid = [5fbc217e000000020000007b, seq: 5]]
2020-11-23 22:32:53.012 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (12)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007b, seq: 7] , nfsState.stateid() = [5fbc217e000000020000007b, seq: 7] , stateid = [5fbc217e000000020000007b, seq: 7]]
2020-11-23 22:32:53.012 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (14)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007b, seq: 7] , nfsState.stateid() = [5fbc217e000000020000007b, seq: 8] , stateid = [5fbc217e000000020000007b, seq: 7]]
2020-11-23 22:32:53.013 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (22)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007b, seq: 8] , nfsState.stateid() = [5fbc217e000000020000007b, seq: 8] , stateid = [5fbc217e000000020000007b, seq: 8]]
2020-11-23 22:32:53.013 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (13)] INFO  o.d.n.v.OperationCLOSE - [contextCurrentStateId = [000000000000000000000000, seq: 0] , _args.opclose.open_stateid = [5fbc217e000000020000007b, seq: 8] , nfsState.stateid() = [5fbc217e000000020000007b, seq: 8] , stateid = [5fbc217e000000020000007b, seq: 8]]
2020-11-23 22:32:53.013 [minikube.mshome.net/172.24.45.130:735] [nfs4j@2049 (13)] INFO  o.d.n.v.AbstractOperationExecutor - Lost client state: op: CLOSE : NFS4ERR_BAD_STATEID : State not known to the client: [5fbc217e000000020000007b, seq: 8]

bug-nfs-multiple-open-close.zip

In the log, you'll notice three times where the seqid isn't the one expected. And in the capture, we can see that it's the seqid of the previous OPEN. The NFS client is a Linux VM (minikube).

kofemann commented 4 years ago

Thanks for the info. I will try to reproduce it.

kofemann commented 3 years ago

I'm not able to reproduce this. To you run any special test, or do yo see errors on the application side?

sraillard commented 3 years ago

As we have experienced a lot of issues, I was able to write a small piece of code in PHP to reproduce it. It's quite straightforward: just do some reads or write from different process on the same file at the same time and you can reproduce the issue.

The code is forking 4 process, and each process will try to write 4 times a line in the same log file. So, that is a total of 16 write operations with concurrent access to the same file.

Here is the test code (can do a write or a read test):

<?php

    echo ("Start of PARENT\n");

    // Reading or writing test
    $WriteTest=TRUE;

    // Test file management
    $Filename="test-file.txt";
    if ($WriteTest) {
        // Write test, remove the file before testing
        if (file_exists($Filename)) unlink($Filename);
    } else {
        // Read test, create a test file
        file_put_contents($Filename,"Reading test");
    }

    // Loop for forking child processes
    $pid=-1;
    for ($i=0; $i<4; $i++) {
        $pid=pcntl_fork();
        if ($pid == -1) {
            die("ERROR fork #".($i+1)." !\n");
        } else if ($pid==0) {
            // PARENT, do nothing
            echo("Still in PARENT\n");
        } else {
            // CHILD, go to child operation
            goto child;
        }
    }
    die("End of PARENT\n");

    // Child operations
child:
    echo("Start of CHILD #".($i+1)." PID=".$pid."\n");

    $ts1=microtime(TRUE);
    for ($j=0; $j<4; $j++) {
        $ts2=microtime(TRUE);
        if ($WriteTest) {
            // Concurrent write test
            file_put_contents($Filename,"LOG - ".date("H:i:s.v")." - ".intval(($ts2-$ts1)*1000)." ms - CHILD #".($i+1).".".($j+1)." PID=".$pid."\n",FILE_APPEND );
        } else {
            // Concurrent read test
            $contents=file_get_contents($Filename);
        }
        $ts1=$ts2;
    }

    die("End of CHILD #".($i+1)." PID=".$pid."\n");

And the results:

time php fork.php
Start of PARENT
Start of CHILD #1 PID=77
Still in PARENT
Start of CHILD #2 PID=78
Still in PARENT
Start of CHILD #3 PID=79
Still in PARENT
Still in PARENT
End of PARENT
Start of CHILD #4 PID=80
End of CHILD #4 PID=80
End of CHILD #3 PID=79
End of CHILD #1 PID=77
End of CHILD #2 PID=78

real    0m16.801s
user    0m0.008s
sys     0m0.007s

Of course, the total time of 16 seconds is showing a blocking issue here!

And the content of the log file written:

LOG - 10:36:04.000 - 0 ms - CHILD #1.1 PID=77
LOG - 10:36:04.000 - 0 ms - CHILD #3.1 PID=79
LOG - 10:36:04.000 - 0 ms - CHILD #4.1 PID=80
LOG - 10:36:04.000 - 0 ms - CHILD #2.1 PID=78
LOG - 10:36:09.000 - 5590 ms - CHILD #4.2 PID=80
LOG - 10:36:09.000 - 5591 ms - CHILD #2.2 PID=78
LOG - 10:36:09.000 - 5441 ms - CHILD #3.2 PID=79
LOG - 10:36:09.000 - 5416 ms - CHILD #1.2 PID=77
LOG - 10:36:15.000 - 5551 ms - CHILD #4.3 PID=80
LOG - 10:36:15.000 - 5875 ms - CHILD #3.3 PID=79
LOG - 10:36:15.000 - 5637 ms - CHILD #2.3 PID=78
LOG - 10:36:15.000 - 323 ms - CHILD #4.4 PID=80
LOG - 10:36:15.000 - 149 ms - CHILD #3.4 PID=79
LOG - 10:36:15.000 - 6050 ms - CHILD #1.3 PID=77
LOG - 10:36:21.000 - 5312 ms - CHILD #1.4 PID=77
LOG - 10:36:20.000 - 5451 ms - CHILD #2.4 PID=78

You can see the blocking time of around 5 seconds on some operations.

dkocher commented 1 year ago

Is this possibly related to #120?

kofemann commented 1 year ago

@dkocher Sounds like it. The reproducer on my side was

fio --name 'read-test' --description 'Simple NFS read test' \
        --rw=read  --numjobs 8 --thread --blocksize 1M --filename 10GB

dCache / nfs4j

seqid mismatch during CLOSE #96