cyverse / gocommands

iRODS Command-line Tools written in Go
Other
29 stars 18 forks source link

Different replica status using `gocommands` vs `icommands` #11

Closed jnimoth closed 1 year ago

jnimoth commented 1 year ago

Short:

gocommands behaves different to icommands when uploading a file to iRODS. Onn the same iRODS system/hierarchy, where the files exists as two replicas after upload, the the two replica appear as a 'good' state (& for both replicas) when using icommands for the transfer, the same process using gocommands results in one replica marked in a 'good' state (&) while the second replica is marked in a 'stale' state (X).

Using icommands, it is possible to put both replicas by hand again in the 'good' state.

Possible resource for more information about the meaning of the replica statuses: https://irods.org/uploads/2021/King-iRODS-Logical_Locking-paper.pdf

More detailed explanation:

In this case, an iRODS environment with the following resource hierarchy is used:

$ ilsresc
bucket-rdms-dev:s3
cohesity:unixfilesystem
F200Resc:unixfilesystem
mnt_irods000_cache:unixfilesystem
mnt_irods001_cache:unixfilesystem
mnt_irods002_cache:unixfilesystem
mnt_irods100_cache:unixfilesystem
mnt_irods101_cache:unixfilesystem
mnt_irods102_cache:unixfilesystem
mnt_irods200_cache:unixfilesystem
mnt_irods201_cache:unixfilesystem
mnt_irods202_cache:unixfilesystem
rootResc:passthru
└── rootRandy:random
    ├── ptA:passthru
    │   └── replA:replication
    │       ├── randy01:random
    │       │   ├── pt001:passthru
    │       │   │   └── mnt_irodsd001:unixfilesystem
    │       │   └── pt011:passthru
    │       └── randy10:random
    │           ├── pt100:passthru
    │           │   └── mnt_irodsd100:unixfilesystem
    │           └── pt110:passthru
    ├── ptB:passthru
    │   └── replB:replication
    │       ├── randy00:random
    │       │   ├── pt000:passthru
    │       │   │   └── mnt_irodsd000:unixfilesystem
    │       │   └── pt010:passthru
    │       └── randy21:random
    │           ├── pt201:passthru
    │           │   └── mnt_irodsd201:unixfilesystem
    │           └── pt211:passthru
    └── ptC:passthru
        └── replC:replication
            ├── randy11:random
            │   ├── pt101:passthru
            │   │   └── mnt_irodsd101:unixfilesystem
            │   └── pt111:passthru
            └── randy20:random
                ├── pt200:passthru
                │   └── mnt_irodsd200:unixfilesystem
                └── pt210:passthru
tapeResc:passthru
└── compoundResc:compound
    ├── tapeCache:unixfilesystem
    └── tapeStorage:univmss

The rootResc was used as default iRODS resources in the following transfers, either with icommands or gocommands.

Using icommands

$ iput 100_MB_1 icommands_tests/
$ ils -l icommands_tests
/devrugZone/home/exampleuser/icommands_tests:
  exampleuser      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodsd000    104857600 2023-03-10.13:50 & 100_MB_1
  exampleuser      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodsd201    104857600 2023-03-10.13:50 & 100_MB_1

As can be seen, both replicas are marked &, indicating 'good' replica status for both.

Using gocommands

Without the --no_replication flag:

$ v0.6.1/gocmd --config config_files/config_dev.yaml put --progress test_files/100_MB_1 gocommands_tests
ERRO[2023-03-10 13:52:28.453] SYS_NOT_ALLOWED                               function=Start package=commons struct=ParallelJobManager
ERRO[2023-03-10 13:52:28.453] SYS_NOT_ALLOWED                               function=processPutCommand package=main
SYS_NOT_ALLOWED

With the --no_replication flag:

$ v0.6.1/gocmd --config config_files/config_dev.yaml put --progress --no_replication test_files/100_MB_2 gocommands_tests

Checking the the status of the replicas after transfer shows that independet of the --no_progress flag being used or not, one replica is always marked as 'stale'

$ v0.6.1/gocmd --config config_files/config_dev.yaml ls -l gocommands_tests/
  exampleuser   0   rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101    104857600   2023-03-10.13:52    X   100_MB_1
  exampleuser   1   rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200    104857600   2023-03-10.13:52    &   100_MB_1
  exampleuser   0   rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodsd001    104857600   2023-03-10.13:54    X   100_MB_2
  exampleuser   1   rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodsd100    104857600   2023-03-10.13:54    &   100_MB_2

(Using ls -l from icommands on the same files displays the same replica status.)

It is possible to update the replica status using the irepl subcommmand from icommands

$ irepl -aU gocommands_tests/100_MB_1
$ irepl -aU gocommands_tests/100_MB_2

Checking again with gocommands (but also icommands) shows now all replicas in a 'good' status:

$ v0.6.1/gocmd --config config_files/config_dev.yaml ls -l gocommands_tests/
  exampleuser   0   rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101    104857600   2023-03-10.13:57    &   100_MB_1
  exampleuser   1   rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200    104857600   2023-03-10.13:52    &   100_MB_1
  exampleuser   0   rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodsd001    104857600   2023-03-10.13:57    &   100_MB_2
  exampleuser   1   rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodsd100    104857600   2023-03-10.13:54    &   100_MB_2

While it might be that this is related to our iRODS hierarchy, I think it is curious to observe that icommands and gocommands are so different in this regard. It would be great to hear if it is possible to fix this. Thanks!

iychoi commented 1 year ago

gocommands follows same file upload mechanism as python-irodsclient rather than icommands. There's big difference in here. gocommands and python-irodsclient use file upload over streaming I/O API over port 1247 on iCAT server. However, icommands uses bulk I/O API that transfers file over different ports on resource servers. This may cause the difference. I'll need to take a look little more into this.

I don't have the test setup that can replicate this issue. Can you provide me a test account to debug this bug?

jnimoth commented 1 year ago

I will see if I can get a test access to our iRODS environment for you, but I am not sure if it is easily possible. I will contact you as soon as I have a solution/answer for that.

Otherwise, I can say that I do not see this issue when using the python-irodsclient (PRC) in our iRODS setup, neither using the iBridges client that runs on the PRC nor when running a single put command with PRC by hand.

iychoi commented 1 year ago

Can you provide your python script that makes difference? Then I can compare how it works with gocommands.

jnimoth commented 1 year ago

Hi, here the example for a small python script that I used to test the upload of a file. It basically uses to session.data_objects.put transfer the file to its destination on the iRODS side:

import getpass
import ssl
import irods 
from irods.session import iRODSSession
import irods.keywords as kw

options = {kw.VERIFY_CHKSUM_KW: '', kw.REG_CHKSUM_KW: "regChksum"}

pwd = getpass.getpass(prompt= "Enter Password:")

ssl_context = ssl.create_default_context(purpose=ssl.Purpose.SERVER_AUTH, cafile=None, capath=None, cadata=None)

ssl_settings = {'client_server_negotiation': 'request_server_negotiation',
                'client_server_policy': 'CS_NEG_REQUIRE',
                'encryption_algorithm': 'AES-256-CBC',
                'encryption_key_size': 32,
                'encryption_num_hash_rounds': 16,
                'encryption_salt_size': 8,
                'ssl_context': ssl_context,
                'ssl_verify_server': 'cert'
}

session = iRODSSession(host='<SERVER_NAME>', authentication_scheme='PAM',
  port=1247, user='<USER_NAME>', password=pwd, zone='<ZONE_NAME>', **{"irods_default_hash_scheme": "MD5"},
   **ssl_settings) 

coll = session.collections.get('<COLLECTION_NAME>')

session.data_objects.put('example_file', coll.path+'/example_file, **options)

Moreover, I also tested today the behavior when I use your go-irodsclient directly instead of gocommands.

I used the thre example upload configurations that you provide on the github instance (upload.go, upload_parallel.go and upload_parallel_async.go) with the following account.yml file:

  hostname: "<SERVER>"
  port: 1247
user:
  username: "j.p.nimoth@rug.nl"
  password: "<PASSWORD>"
  zone: "devrugZone"
auth_scheme: "pam"
cs_negotiation: true
cs_negotiation_policy: "CS_NEG_REQUIRE"
ssl:
  ca_cert_file: "/etc/ssl/certs/ca-certificates.crt"
  key_size: 32
  algorithm: "AES-256-CBC"
  salt_size: 8
  hash_rounds: 16

These were my observations:

Using upload.go file, the file is successfully transferred and both replica appear with a 'good' status:

$ go run upload.go /home/jelte/Downloads/20221216134657.pdf  /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests/upload.pdf
Successfully uploaded a file /home/jelte/Downloads/20221216134657.pdf to /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests/upload.pdf, size = 132375
$ ils -l /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests
/devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests:
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodsd001       132375 2023-03-14.14:46 & upload.pdf
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodsd100       132375 2023-03-14.14:46 & upload.pdf

Using upload_parallel.go, does result in the behavior that was also observed with gocommands (one replica marked as 'good', the other marked as 'stale):

$ go run upload_parallel.go /home/jelte/Downloads/20221216134657.pdf  /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests/upload_parallel.pdf
Successfully uploaded a file /home/jelte/Downloads/20221216134657.pdf to /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests/upload_parallel.pdf, size = 132375
$ ils -l /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests
/devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests:
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodsd000       132375 2023-03-14.14:48 X upload_parallel.pdf
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodsd201       132375 2023-03-14.14:48 & upload_parallel.pdf

Using upload_parallel_async.go successfully transfer the files and both replica are marked as good:

$ go run upload_parallel_async.go /home/jelte/Downloads/20221216134657.pdf  /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests/upload_parallel_a.pdf
INFO[0001] done writing data at offset 0                 function=main package=main
Successfully uploaded a file /home/jelte/Downloads/20221216134657.pdf to /devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests/upload_parallel_a.pdf, size = 132375
$ ils -l
/devrugZone/home/j.p.nimoth@rug.nl/go-irods_tests:
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101       132375 2023-03-14.14:51 & upload_parallel_a.pdf
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200       132375 2023-03-14.14:51 & upload_parallel_a.pdf
iychoi commented 1 year ago

Thanks for the detailed tests. I'll take a look.

iychoi commented 1 year ago

Can you remind me which version of iRODS you are running?

Actually, the upload_parallel.go and upload_parallel_async.go use the same API for writing content. So they should work exactly in the same way. Can you repeat the two examples few more times to see if they show different result? It might be related to timing.

iychoi commented 1 year ago

I have another question. When you run the python script with python-irodsclient, did you get any errors, like SYS_NOT_ALLOWED? The python script runs replicate API internally. So, if it didn't throw any errors, it means the gocommands should also work without --no_replication flag. But gocommands returned the error as you saw, there might be a bug in the replication call.

jnimoth commented 1 year ago

Hi, here the answers to your question:

iRODS version: 4.2.11

Comparison upload_parallel.go and upload_parallel_async.go: I just test with multiple test files (50 files in total). In every case the files that were transferred with upload_parallel.go showed one good (&) and one stale replica (X). On the other hand, both replica are marked as good (&) if upload_parallel_async.go, is used.

SYS_NOT_ALLOWED with PRC: I do not get any error messages shown when trying to transfer test files with the PRC as shown above.

iychoi commented 1 year ago

Thanks for the info. I think I found the cause!

iychoi commented 1 year ago

Can you try the new release v0.6.4?

https://github.com/cyverse/gocommands/releases/tag/v0.6.4

jnimoth commented 1 year ago

I just tried with the put command and the new version v0.6.4. Sadly, I still see the issue. I tried using with/without the additional --no_replication flag.

See: Upload with --no_replication flag:

$v0.6.4/gocmd --config config_files/config_test.yaml put --progress --no_replication ./test_files/50_MB_1  gocommands_v064/50_MB_1
/home/jelte/git_projects/gocommands_binaries/test~ ... done! [52.43MB in 6.392s]

Upload without --no_replication flag:

$ v0.6.4/gocmd --config config_files/config_test.yaml put --progress  ./test_files/50_MB_2  gocommands_v064/50_MB_2
/home/jelte/git_projects/gocommands_binaries/test~ ... done! [52.43MB in 6.043s]
ERRO[2023-03-29 11:52:52.805] failed to upload /home/jelte/git_projects/gocommands_binaries/test_files/50_MB_2 to /testrug/home/j.p.nimoth@rug.nl/gocommands_v064/50_MB_2: failed to replicate data object: SYS_NOT_ALLOWED  function=Start package=commons struct=ParallelJobManager
Error: failed to perform parallel jobs:
    github.com/cyverse/gocommands/cmd/subcmd.processPutCommand
        /github/workspace/cmd/subcmd/put.go:197
  - failed to upload /home/jelte/git_projects/gocommands_binaries/test_files/50_MB_2 to /testrug/home/j.p.nimoth@rug.nl/gocommands_v064/50_MB_2:
    github.com/cyverse/gocommands/cmd/subcmd.putOne.func1
        /github/workspace/cmd/subcmd/put.go:241
  - failed to replicate data object:
    github.com/cyverse/go-irodsclient/irods/fs.ReplicateDataObject
        /go/pkg/mod/github.com/cyverse/go-irodsclient@v0.11.3/irods/fs/data_object.go:1268
  - SYS_NOT_ALLOWED
ERRO[2023-03-29 11:52:52.806] failed to perform parallel jobs:
    github.com/cyverse/gocommands/cmd/subcmd.processPutCommand
        /github/workspace/cmd/subcmd/put.go:197
  - failed to upload /home/jelte/git_projects/gocommands_binaries/test_files/50_MB_2 to /testrug/home/j.p.nimoth@rug.nl/gocommands_v064/50_MB_2:
    github.com/cyverse/gocommands/cmd/subcmd.putOne.func1
        /github/workspace/cmd/subcmd/put.go:241
  - failed to replicate data object:
    github.com/cyverse/go-irodsclient/irods/fs.ReplicateDataObject
        /go/pkg/mod/github.com/cyverse/go-irodsclient@v0.11.3/irods/fs/data_object.go:1268
  - SYS_NOT_ALLOWED  function=main package=main

Check replica status with ls -l still shows one replica marked as 'good' and the other as 'stale' (icommands display the same replica status as seen here):

$ v0.6.4/gocmd --config config_files/config_test.yaml ls -l gocommands_v064
  j.p.nimoth@rug.nl 0   rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001    52428800    2023-03-29.11:52    X   50_MB_1
  j.p.nimoth@rug.nl 1   rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100    52428800    2023-03-29.11:52    &   50_MB_1
  j.p.nimoth@rug.nl 0   rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001    52428800    2023-03-29.11:52    X   50_MB_2
  j.p.nimoth@rug.nl 1   rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100    52428800    2023-03-29.11:52    &   50_MB_2

I am using the current version, see:

$ v0.6.4/gocmd --config config_files/config_test.yaml --version
{
  "clientVersion": "v0.6.4",
  "gitCommit": "b9ec27b8656cdb1315c39628341fd66b7efb90cc",
  "buildDate": "2023-03-28T17:27:07Z",
  "goVersion": "go1.18.6",
  "compiler": "gc",
  "platform": "linux/amd64"
}
iychoi commented 1 year ago

That is weird. Can you redo the test with upload_parallel.go and upload_parallel_async.go? That will help me to understand what's going on.

jnimoth commented 1 year ago

Hey Illyoung, I did the tests again today with the current version of go-irodsclient (v0.11.3).

This time, upload_parallel.go and upload_parallel_async.go gave identical results. One replica is always marked as 'stale' (X), the other as 'good' (&).

I do not know why this was last time not the case when I reported.


The following tests were done one iRODS 4.2.11

The resoruces hierarchy of the iRODS instance looks like this (rootResc):

Cohesityresc00:unixfilesystem
F200resc00:unixfilesystem
F200resc10:unixfilesystem
F200resc20:unixfilesystem
rootResc:passthru
└── rootRandy:random
    ├── ptA:passthru
    │   └── replA:replication
    │       ├── randy01:random
    │       │   ├── pt001:passthru
    │       │   │   └── mnt_irodst001:unixfilesystem
    │       │   └── pt011:passthru
    │       └── randy10:random
    │           ├── pt100:passthru
    │           │   └── mnt_irodst100:unixfilesystem
    │           └── pt110:passthru
    ├── ptB:passthru
    │   └── replB:replication
    │       ├── randy00:random
    │       │   ├── pt000:passthru
    │       │   │   └── mnt_irodst000:unixfilesystem
    │       │   └── pt010:passthru
    │       └── randy21:random
    │           ├── pt201:passthru
    │           │   └── mnt_irodst201:unixfilesystem
    │           └── pt211:passthru
    └── ptC:passthru
        └── replC:replication
            ├── randy11:random
            │   ├── pt101:passthru
            │   │   └── mnt_irodst101:unixfilesystem
            │   └── pt111:passthru
            └── randy20:random
                ├── pt200:passthru
                │   └── mnt_irodst200:unixfilesystem
                └── pt210:passthru
tapeResc:passthru
└── compoundResc:compound
    ├── tapeCache:unixfilesystem
    └── tapeStorage:univmss
tapeRescDeferred:deferred
├── ptLargeFiles:passthru
│   └── compoundRescLocal:compound
│       ├── tapeCacheLocal:unixfilesystem
│       └── tapeStorageLocal:univmss
└── ptSmallFiles:passthru
    └── simpleResc:unixfilesystem
testResc:passthru
└── testVolume:unixfilesystem
testTape:univmss
testTapeResc:passthru
└── testCompoundResc:compound
    ├── testTapeCache:unixfilesystem
    └── testTapeStorage:univmss

And this is the config file that I used:

host:
  hostname: "[SERVER]"
  port: 1247
user:
  username: "j.p.nimoth@rug.nl"
  password: "[PASSWORD]"
  zone: "devrugZone"
auth_scheme: "pam"
cs_negotiation: true
cs_negotiation_policy: "CS_NEG_REQUIRE"
ssl:
  ca_cert_file: "/etc/ssl/certs/ca-certificates.crt"
  key_size: 32
  algorithm: "AES-256-CBC"
  salt_size: 8
  hash_rounds: 16

Using upload_parallel.go

$ ls | grep Testfile | xargs -I {} go run upload_parallel.go {} /testrug/home/j.p.nimoth@rug.nl/upload_parallel/{}       
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_1 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_1, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_10 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_10, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_2 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_2, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_3 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_3, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_4 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_4, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_5 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_5, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_6 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_6, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_7 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_7, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_8 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_8, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel/10MB_Testfile_9 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel/10MB_Testfile_9, size = 10000000

$ ils -l upload_parallel                                                                                                 
/testrug/home/j.p.nimoth@rug.nl/upload_parallel:
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:47 X 10MB_Testfile_1
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:47 & 10MB_Testfile_1
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:47 X 10MB_Testfile_10
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:47 & 10MB_Testfile_10
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001     10000000 2023-03-30.09:47 X 10MB_Testfile_2
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:47 & 10MB_Testfile_2
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:47 X 10MB_Testfile_3
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:47 & 10MB_Testfile_3
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:47 X 10MB_Testfile_4
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:47 & 10MB_Testfile_4
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001     10000000 2023-03-30.09:47 X 10MB_Testfile_5
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:47 & 10MB_Testfile_5
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01pt001;mnt_irodst001     10000000 2023-03-30.09:48 X 10MB_Testfile_6
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:48 & 10MB_Testfile_6
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001     10000000 2023-03-30.09:48 X 10MB_Testfile_7
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:48 & 10MB_Testfile_7
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:48 X 10MB_Testfile_8
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:48 & 10MB_Testfile_8
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:48 X 10MB_Testfile_9
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:48 & 10MB_Testfile_9

Using upload_parallel_async.go

$ ls | grep Testfile | xargs -I {} go run upload_parallel_async.go {} /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/{}
INFO[0006] done writing data at offset 0                 function=main package=main
INFO[0006] done writing data at offset 1048576           function=main package=main
INFO[0006] done writing data at offset 2097152           function=main package=main
INFO[0006] done writing data at offset 3145728           function=main package=main
INFO[0006] done writing data at offset 4194304           function=main package=main
INFO[0006] done writing data at offset 5242880           function=main package=main
INFO[0006] done writing data at offset 6291456           function=main package=main
INFO[0006] done writing data at offset 7340032           function=main package=main
INFO[0006] done writing data at offset 8388608           function=main package=main
INFO[0006] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_1 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_1, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0003] done writing data at offset 6291456           function=main package=main
INFO[0003] done writing data at offset 7340032           function=main package=main
INFO[0003] done writing data at offset 8388608           function=main package=main
INFO[0003] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_10 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_10, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0003] done writing data at offset 6291456           function=main package=main
INFO[0003] done writing data at offset 7340032           function=main package=main
INFO[0003] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_2 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_2, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0003] done writing data at offset 6291456           function=main package=main
INFO[0003] done writing data at offset 7340032           function=main package=main
INFO[0003] done writing data at offset 838860           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_3 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_3, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0003] done writing data at offset 6291456           function=main package=main
INFO[0003] done writing data at offset 7340032           function=main package=main
INFO[0004] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_4 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_4, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0004] done writing data at offset 1048576           function=main package=main
INFO[0004] done writing data at offset 2097152           function=main package=main
INFO[0004] done writing data at offset 3145728           function=main package=main
INFO[0004] done writing data at offset 4194304           function=main package=main
INFO[0004] done writing data at offset 5242880           function=main package=main
INFO[0004] done writing data at offset 6291456           function=main package=main
INFO[0004] done writing data at offset 7340032           function=main package=main
INFO[0004] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_5 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_5, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0004] done writing data at offset 6291456           function=main package=main
INFO[0004] done writing data at offset 7340032           function=main package=main
INFO[0004] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_6 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_6, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0003] done writing data at offset 6291456           function=main package=main
INFO[0003] done writing data at offset 7340032           function=main package=main
INFO[0004] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_7 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_7, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0003] done writing data at offset 5242880           function=main package=main
INFO[0003] done writing data at offset 6291456           function=main package=main
INFO[0003] done writing data at offset 7340032           function=main package=main
INFO[0004] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_8 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_8, size = 10000000
INFO[0003] done writing data at offset 0                 function=main package=main
INFO[0003] done writing data at offset 1048576           function=main package=main
INFO[0003] done writing data at offset 2097152           function=main package=main
INFO[0003] done writing data at offset 3145728           function=main package=main
INFO[0003] done writing data at offset 4194304           function=main package=main
INFO[0004] done writing data at offset 5242880           function=main package=main
INFO[0004] done writing data at offset 6291456           function=main package=main
INFO[0004] done writing data at offset 7340032           function=main package=main
INFO[0004] done writing data at offset 8388608           function=main package=main
INFO[0004] done writing data at offset 9437184           function=main package=main
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload_parallel_async/10MB_Testfile_9 to /testrug/home/j.p.nimoth@rug.nl/upload_parallel_async/10MB_Testfile_9, size = 10000000

$ ils -l upload_parallel_async                                                                                                
/testrug/home/j.p.nimoth@rug.nl/upload_parallel_async:
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:50 X 10MB_Testfile_1
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:50 & 10MB_Testfile_1
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:50 X 10MB_Testfile_10
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:50 & 10MB_Testfile_10
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001     10000000 2023-03-30.09:50 X 10MB_Testfile_2
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:50 & 10MB_Testfile_2
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:50 X 10MB_Testfile_3
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:50 & 10MB_Testfile_3
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:50 X 10MB_Testfile_4
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:50 & 10MB_Testfile_4
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:50 X 10MB_Testfile_5
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:50 & 10MB_Testfile_5
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:51 X 10MB_Testfile_6
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:51 & 10MB_Testfile_6
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:51 X 10MB_Testfile_7
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:51 & 10MB_Testfile_7
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001     10000000 2023-03-30.09:51 X 10MB_Testfile_8
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:51 & 10MB_Testfile_8
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:51 X 10MB_Testfile_9
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:51 & 10MB_Testfile_9

Using upload.go for comparison

$ ls | grep Testfile | xargs -I {} go run upload.go {} /testrug/home/j.p.nimoth@rug.nl/upload/{}       
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_1 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_1, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_10 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_10, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_2 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_2, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_3 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_3, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_4 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_4, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_5 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_5, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_6 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_6, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_7 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_7, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_8 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_8, size = 10000000
Successfully uploaded a file /home/jeltenimoth/git_programs/github/go-irodsclient/examples/upload/10MB_Testfile_9 to /testrug/home/j.p.nimoth@rug.nl/upload/10MB_Testfile_9, size = 10000000

$ ils -l upload                                                                                                            
/testrug/home/j.p.nimoth@rug.nl/upload:
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:52 & 10MB_Testfile_1
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:52 & 10MB_Testfile_1
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:52 & 10MB_Testfile_10
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:52 & 10MB_Testfile_10
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:53 & 10MB_Testfile_2
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:53 & 10MB_Testfile_2
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:53 & 10MB_Testfile_3
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:53 & 10MB_Testfile_3
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:53 & 10MB_Testfile_4
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:53 & 10MB_Testfile_4
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:53 & 10MB_Testfile_5
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:53 & 10MB_Testfile_5
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodst001     10000000 2023-03-30.09:53 & 10MB_Testfile_6
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodst100     10000000 2023-03-30.09:53 & 10MB_Tesfile_6
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:53 & 10MB_Testfile_7
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:53 & 10MB_Testfile_7
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodst000     10000000 2023-03-30.09:53 & 10MB_Testfile_8
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodst201     10000000 2023-03-30.09:53 & 10MB_Testfile_8
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodst200     10000000 2023-03-30.09:53 & 10MB_Testfile_9
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodst101     10000000 2023-03-30.09:53 & 10MB_Testfile_9
iychoi commented 1 year ago

To be honest, I don't know why this fails for now.

You said that the old version of go-irodsclient (v0.11.2)'s upload_parallel_async worked but its new version (v0.11.3) doesn't. Actually, there is no change in the code for the upload_parallel_async.

The UploadDataObjectParallelInBlockAsync function is the one upload_parallel_async uses. https://github.com/cyverse/go-irodsclient/commit/fdd94d44ae21a1f78dfc13a5783a2767fcbc91e1

As you see, there's no code change, but it shows different result. With this result, there are two possibilities.

  1. The result may change per run. This mean the issue may be in iRODS or iRODS Rules/Configuration.
  2. The experimental may be performed incorrectly. Maybe used wrong binary/version.

I cannot judge with this confusing clues. What I can do for now is adding an optional flag that disables parallel file upload, as sequential file upload still works well.

iychoi commented 1 year ago

I added a new commit that adds a new flag --single_threaded. The flag disables the use of parallel file upload API introduced in iRODS v4.2.11.

jnimoth commented 1 year ago

Yes, I also do not understand why I once saw that the upload_parallel_async behaves differently, but I also assume that it was likely an error on my side.

I tried now also put with the --single_threaded flag that you introduced in the last commit. In my trials, the issue is gone when the flag is used. Both replica appear with a 'good' status on the iRODS side.

While I find the issue still somewhat strange, I am already quite happy with that.

Thanks!

iychoi commented 1 year ago

Thanks for testing. I'll keep this issue opened until I find a concrete solution. I'll revisit this issue when our data store gets upgraded to v4.2.11 which supports parallel file upload.

jnimoth commented 1 year ago

Thanks!

One last question/remark: As far as I see, the new --single_threaded flag is currently not available in the last pre-built binary (v0.6.4).

It would be great if this could be included in the next version!

iychoi commented 1 year ago

Just released v0.6.5. Please check it out.

iychoi commented 1 year ago

Fixed in the commit 1c6320e https://github.com/cyverse/gocommands/commit/1c6320e0fd093bec883f560a371365ac9bc24807