cyverse / gocommands

iRODS Command-line Tools written in Go
Other
29 stars 18 forks source link

Strange behavior after modifying a file locally and using the sync command #8

Closed jnimoth closed 1 year ago

jnimoth commented 1 year ago

Hey! I have again some odd behavior that I experienced during my last trials with gocommands:

After modifying a local file and syncing the parent directory to iRODS using the sync command, I cannot re-download the directory/file. I attached below a step-by-step overview of what I did and what I experienced.

The following was done exactly:

I have the following local source directory:

$ ls sync_source1
ca-bundle.crt  Document1.txt  Document2.txt  GroningenCity_Montage.jpg  Test.md

with a specific text file with some random content:

$ cat sync_source1/Document1.txt 
Test
Test
Blablablabla

More text
gagjlksdhgkjsdgkjsdgh

I first transferred this to the iRODS side using the put command. No errors and all files are on the iRODS side as shown with the following ls command

$ ./gocmd --config config_gocmd.yaml ls sync_source1
  Document1.txt
  Document2.txt
  GroningenCity_Montage.jpg
  Test.md
  ca-bundle.crt

For completeness, I can also re-download this at the dir at this point back from iRODS using the get command without any issue

$ ./gocmd --config config_gocmd.yaml get --progress -d sync_source1 sync_source1_downloaded
DEBU[2023-02-01 14:21:35.892] use sessionID - 14483                         function=ProcessCommonFlags package=commons
DEBU[2023-02-01 14:21:35.893] reading config file/dir - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
DEBU[2023-02-01 14:21:35.893] reading gocommands YAML config file - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
Test.md                                            ... done! [20B in 734ms]
ca-bundle.crt                                      ... done! [199.36KB in 694ms]
Document2.txt                                      ... done! [0B in 0s]
GroningenCity_Montage.jpg                          ... done! [553.87KB in 678ms]
Document1.txt                                      ... done! [24B in 710ms]

Also, I can display content of a text file on the iRODS side via cat to see that it is same than on the source

$ ./gocmd --config config_gocmd.yaml cat -d sync_source1/Document1.txt
DEBU[2023-02-01 14:22:35.612] use sessionID - 14483                         function=ProcessCommonFlags package=commons
DEBU[2023-02-01 14:22:35.612] reading config file/dir - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
DEBU[2023-02-01 14:22:35.612] reading gocommands YAML config file - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
Test
Test
Blablablabla

More text
gagjlksdhgkjsdgkjsdgh

Now, modify the text file on the source wit a text editor. I just added some extra lines.

$ vim sync_source1/Document1.txt
[add a bit of text, then save again]

Sync local source directory with iRODS dest again using the sync command

$ ./gocmd --config config_gocmd.yaml sync --progress --no_replication -d sync_source1 i:sync_source1
DEBU[2023-02-01 14:23:58.374] use sessionID - 14483                         function=ProcessCommonFlags package=commons
DEBU[2023-02-01 14:23:58.375] reading config file/dir - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
DEBU[2023-02-01 14:23:58.375] reading gocommands YAML config file - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
Document2.txt                                      ... done! [0B in 0s]
ca-bundle.crt                                      ... done! [199.36KB in 8.965s]
Test.md                                            ... done! [20B in 13.328s]
Document1.txt                                      ... done! [64B in 17.841s]
GroningenCity_Montage.jpg                          ... done! [553.87KB in 22.065s]

Trying to display the content on the iRODS side again via cat does now not work anymore

$ ./gocmd --config config_gocmd.yaml cat -d sync_source1/Document1.txt
DEBU[2023-02-01 14:24:50.731] use sessionID - 14483                         function=ProcessCommonFlags package=commons
DEBU[2023-02-01 14:24:50.731] reading config file/dir - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
DEBU[2023-02-01 14:24:50.731] reading gocommands YAML config file - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
ERRO[2023-02-01 14:24:54.909] could not find a data object or a directory   function=processCatCommand package=main
could not find a data object or a directory

Checking via ls shows that the directory is still there

$ ./gocmd --config config_gocmd.yaml ls sync_source1
  Document1.txt
  Document2.txt
  GroningenCity_Montage.jpg
  Test.md
  ca-bundle.crt

Also download via get does not work anymore. For example for the collection in question, it now results in:

$ ./gocmd --config config_gocmd.yaml get --progress -d sync_source1 downloaded_again
DEBU[2023-02-01 14:39:30.314] use sessionID - 14483                         function=ProcessCommonFlags package=commons
DEBU[2023-02-01 14:39:30.314] reading config file/dir - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons
DEBU[2023-02-01 14:39:30.314] reading gocommands YAML config file - /home/username/config_gocmd.yaml  function=loadConfigFile package=commons

$ ls downloaded_again/sync_source1/

--> Shows empty directory, not the content that was on the source.

I can still download the modified file or its parent directory via icommands. No errors are displayed.

Also, I do see that the re-downloaded text file using icommands shows the modifications that I made to it as explained above.

$ iget -r ./sync_source1/ 
$ cat sync_source1/Document1.txt 
Test
Test
Blablablabla

More text
gagjlksdhgkjsdgkjsdgh

Even more text
blablablablablabla

I saw this behavior under Linux as well as Windows. Both times using the current version 0.4.5.

iychoi commented 1 year ago

Thanks for the report. I'll dig into this soon.

iychoi commented 1 year ago

Are you connecting to CyVerse Data Store or other? Can you tell me what version of iRODS you are connecting to if it's not CyVerse Data Store?

jnimoth commented 1 year ago

Hi Illyoung! I was busy the last days, but could finally make some more trials. During the additional trials realized that this issue not only apply when something was changed locally and then synced, but that the issue seems to be more general:

Syncing a local directory which does not yet exists on iRODS:

$ ./gocmd --config .irods/config_gocmd_dev.yaml sync --progress --no_replication  sync_source i:sync_source
samples_lins.txt                                   ... done! [236B in 3.913s]
SSNMR_files_with_X.txt                             ... done! [45.32KB in 3.853s]

Showing its content with the cat command is possible:

$ ./gocmd --config .irods/config_gocmd_dev.yaml cat  sync_source/samples_lins.txt

More Text!!!
!

Using ils, I see that the files have two replicas, one marked with 'X' (stale).


$ ils -L ./sync_source/
/devrugZone/home/j.p.nimoth@rug.nl/sync_source:
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodsd000          236 2023-02-09.09:32 X samples_lins.txt
    sha2:wKjVPqmTykwUt/meUg6o10KBy7M9HeL+RXN9jVB6JYA=    generic    /mnt/irodsd000/home/j.p.nimoth@rug.nl/sync_source/samples_lins.txt
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodsd201          236 2023-02-09.09:32 & samples_lins.txt
    sha2:wKjVPqmTykwUt/meUg6o10KBy7M9HeL+RXN9jVB6JYA=    generic    /mnt/irodsd201/home/j.p.nimoth@rug.nl/sync_source/samples_lins.txt
  j.p.nimoth@r      0 rootResc;rootRandy;ptB;replB;randy00;pt000;mnt_irodsd000        45323 2023-02-09.09:32 X SSNMR_files_with_X.txt
    sha2:GAvRAkq3rA9OPZ0aVO7T+G26jmb1eOJfXPv8CY2BqrM=    generic    /mnt/irodsd000/home/j.p.nimoth@rug.nl/sync_source/SSNMR_files_with_X.txt
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodsd201        45323 2023-02-09.09:32 & SSNMR_files_with_X.txt
    sha2:GAvRAkq3rA9OPZ0aVO7T+G26jmb1eOJfXPv8CY2BqrM=    generic    /mnt/irodsd201/home/j.p.nimoth@rug.nl/sync_source/SSNMR_files_with_X.txt

In contrast, there are no files marked with 'X' when using gocommands to display:

$ ./gocmd --config .irods/config_gocmd_dev.yaml ls -l sync_source
  j.p.nimoth@rug.nl 0   rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101    45323   2023-02-09.12:08    &   SSNMR_files_with_X.txt
  j.p.nimoth@rug.nl 1   rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200    45323   2023-02-09.12:08    &   SSNMR_files_with_X.txt
  j.p.nimoth@rug.nl 0   rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101    236 2023-02-09.12:08    &   samples_lins.txt
  j.p.nimoth@rug.nl 1   rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200    236 2023-02-09.12:08    &   samples_lins.txt

I can then run the sync again without changing anything

$ ./gocmd --config .irods/config_gocmd_dev.yaml sync --progress --no_replication  sync_source i:sync_source
SSNMR_files_with_X.txt                             ... done! [45.32KB in 2.076s]
samples_lins.txt                                   ... done! [236B in 4.027s]

At this stage, cat gives an error

$ ./gocmd --config .irods/config_gocmd_dev.yaml cat  sync_source/samples_lins.txt
ERRO[2023-02-09 09:34:15.528] could not find a data object or a directory   function=processCatCommand package=main
could not find a data object or a directory

Using ils also shows just one replica, marked with 'X' (stale):

$ ils -L ./sync_source/
/devrugZone/home/j.p.nimoth@rug.nl/sync_source:
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodsd201          236 2023-02-09.09:34 X samples_lins.txt
        generic    /mnt/irodsd201/home/j.p.nimoth@rug.nl/sync_source/samples_lins.txt
  j.p.nimoth@r      1 rootResc;rootRandy;ptB;replB;randy21;pt201;mnt_irodsd201        45323 2023-02-09.09:34 X SSNMR_files_with_X.txt
        generic    /mnt/irodsd201/home/j.p.nimoth@rug.nl/sync_source/SSNMR_files_with_X.txt

On the other hand, the ls -l command of gocommands indicate no stale replicas, but also just one:

$ ils -L ./sync_source/
/devrugZone/home/j.p.nimoth@rug.nl/sync_source:
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200          236 2023-02-09.12:09 X samples_lins.txt
        generic    /mnt/irodsd200/home/j.p.nimoth@rug.nl/sync_source/samples_lins.txt
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200        45323 2023-02-09.12:09 X SSNMR_files_with_X.txt
        generic    /mnt/irodsd200/home/j.p.nimoth@rug.nl/sync_source/SSNMR_files_with_X.txt

I also looked again into the put command and saw that when I transfer files to our iRODS instance with it, there are two replica of which one is marked stale when the transfer is done via gocommands. Doing the same with iput from the icommands package does not result in a replica marked as stale

Using icommands:

$ iput irodsfs put_transfers
$ ils -L put_transfers
/devrugZone/home/j.p.nimoth@rug.nl/put_transfers:
  j.p.nimoth@r      0 rootResc;rootRandy;ptA;replA;randy01;pt001;mnt_irodsd001     20038175 2023-02-09.09:45 & irodsfs
    sha2:BSsAA+vbjItBNZDgAVSJ7S/FkyD4aVOGrKO1bSW1qwg=    generic    /mnt/irodsd001/home/j.p.nimoth@rug.nl/put_transfers/irodsfs
  j.p.nimoth@r      1 rootResc;rootRandy;ptA;replA;randy10;pt100;mnt_irodsd100     20038175 2023-02-09.09:45 & irodsfs
    sha2:BSsAA+vbjItBNZDgAVSJ7S/FkyD4aVOGrKO1bSW1qwg=    generic    /mnt/irodsd100/home/j.p.nimoth@rug.nl/put_transfers/irodsfs

Same using gocammands:

$ ./gocmd --config .irods/config_gocmd_dev.yaml put --progress --no_replication irodsfs put_transfers/ 
irodsfs                                            ... done! [20.04MB in 8.554s]
$ ils -L put_transfers
/devrugZone/home/j.p.nimoth@rug.nl/put_transfers:
  j.p.nimoth@r      0 rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101     20038175 2023-02-09.09:46 X irodsfs
    sha2:BSsAA+vbjItBNZDgAVSJ7S/FkyD4aVOGrKO1bSW1qwg=    generic    /mnt/irodsd101/home/j.p.nimoth@rug.nl/put_transfers/irodsfs
  j.p.nimoth@r      1 rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200     20038175 2023-02-09.09:46 & irodsfs
    sha2:BSsAA+vbjItBNZDgAVSJ7S/FkyD4aVOGrKO1bSW1qwg=    generic    /mnt/irodsd200/home/j.p.nimoth@rug.nl/put_transfers/irodsfs

But when I display with gocommands using ls -l, both replicas show as '&':

$ ./gocmd --config .irods/config_gocmd_dev.yaml ls -l put_transfers
  j.p.nimoth@rug.nl 0   rootResc;rootRandy;ptC;replC;randy11;pt101;mnt_irodsd101    20038175    2023-02-09.09:46    &   irodsfs
  j.p.nimoth@rug.nl 1   rootResc;rootRandy;ptC;replC;randy20;pt200;mnt_irodsd200    20038175    2023-02-09.09:46    &   irodsfs

In this case, I did the trials on our dev environment which runs iRODS 4.2.11. On this instance we have a hierarchy like:

$ ilsresc
bucket-rdms-dev:s3
cohesity:unixfilesystem
F200Resc:unixfilesystem
mnt_irods000_cache:unixfilesystem
mnt_irods001_cache:unixfilesystem
mnt_irods002_cache:unixfilesystem
mnt_irods100_cache:unixfilesystem
mnt_irods101_cache:unixfilesystem
mnt_irods102_cache:unixfilesystem
mnt_irods200_cache:unixfilesystem
mnt_irods201_cache:unixfilesystem
mnt_irods202_cache:unixfilesystem
rootResc:passthru
└── rootRandy:random
    ├── ptA:passthru
    │   └── replA:replication
    │       ├── randy01:random
    │       │   ├── pt001:passthru
    │       │   │   └── mnt_irodsd001:unixfilesystem
    │       │   └── pt011:passthru
    │       └── randy10:random
    │           ├── pt100:passthru
    │           │   └── mnt_irodsd100:unixfilesystem
    │           └── pt110:passthru
    ├── ptB:passthru
    │   └── replB:replication
    │       ├── randy00:random
    │       │   ├── pt000:passthru
    │       │   │   └── mnt_irodsd000:unixfilesystem
    │       │   └── pt010:passthru
    │       └── randy21:random
    │           ├── pt201:passthru
    │           │   └── mnt_irodsd201:unixfilesystem
    │           └── pt211:passthru
    └── ptC:passthru
        └── replC:replication
            ├── randy11:random
            │   ├── pt101:passthru
            │   │   └── mnt_irodsd101:unixfilesystem
            │   └── pt111:passthru
            └── randy20:random
                ├── pt200:passthru
                │   └── mnt_irodsd200:unixfilesystem
                └── pt210:passthru
tapeResc:passthru
└── compoundResc:compound
    ├── tapeCache:unixfilesystem
    └── tapeStorage:univmss

Moreover, I tried to select the cohesity resource (see resource overview) without iRODS replication. On this resource the following could be observed:

When uploading data using the put command that comes with gocommands like:

$ ./gocmd --config dev_cohesity.yaml put --progress --no_replication gocmd 

The file is uploaded without error messages, but the ils command from the original icommands again shows this file as 'stale' (marked 'X') while ls command of gocommands does not.

$ ./gocmd --config dev_cohesity.yaml ls -l
  j.p.nimoth@rug.nl 0   cohesity    11323630    2023-02-14.16:11    &   gocmd

$ ils -L
/devrugZone/home/j.p.nimoth@rug.nl:
  j.p.nimoth@r      0 cohesity     11323630 2023-02-14.16:11 X gocmd
    sha2:peWCWu2jBOjv7h7y7FiuH/dtBxS9BlqQ0BJHAPeq1mI=    generic    /var/lib/irods/mnt/cohesity/irods/home/j.p.nimoth@rug.nl/gocmd

In case of the cohesity resource, the cat function of gocommands does not work as it seems:

$ ./gocmd --config dev_cohesity.yaml put --no_replication sync_again/example_text.txt

$ ./gocmd --config dev_cohesity.yaml ls -l | grep  example_text
  j.p.nimoth@rug.nl 0   cohesity    23  2023-02-14.16:20    &   example_text.txt

$ ./gocmd --config dev_cohesity.yaml cat example_text
ERRO[2023-02-14 16:21:20.375] could not find a data object or a directory   function=processCatCommand package=main
could not find a data object or a directory
iychoi commented 1 year ago

Ah.. I just realized that the & mark displays replica status. I hard-coded it to just show & that's what you saw.

The issue is fixed at the commit 6f9eed7. Will be included in the new release that will be coming soon .