EUDAT-B2SAFE / B2SAFE-core

B2SAFE service core code for EUDAT project
Other
14 stars 20 forks source link

EUDAT/REPLICA accumulates same PID several times #102

Closed chStaiger closed 6 years ago

chStaiger commented 7 years ago

I am testig again the B2SAFE and I found some interesting feature that increases redundancy.

I created a folder “aB2SAFE” on aliceZone which gets replicated to “bB2SAFE” on bobZone. I successively put data into the folder and replicate it with:

replicate{
     writeLine("stdout","DEBUG replication to
            /bobZone/home/$userNameClient#$rodsZoneClient/b2replication");
    EUDATReplication("/$rodsZoneClient/home/$userNameClient/aB2SAFE",
            "/bobZone/home/$userNameClient#$rodsZoneClient/bB2SAFE",
            "true", "true", *response);
}

input null
output ruleExecOut

After several times of replicating the same data, the same PID for the folder bB2SAFE is accumulated in the field EUDAT/REPLICA:

First replication:

[admincentos@alice-centos ~]$ iput -K testA.txt aB2SAFE/testA.txt
[admincentos@alice-centos ~]$ irule -F testRepl.r
[admincentos@alice-centos ~]$ imeta ls -C aB2SAFE
AVUs defined for collection aB2SAFE:
attribute: EUDAT/REPLICA
value: 21.T12996/9b5fd49a-9f99-11e7-8f5c-04060a6400be
units:
----
attribute: PID
value: 21.T12995/9a6d89c4-9f99-11e7-91bf-04060a6400b9
units:
----
attribute: EUDAT/FIXED_CONTENT
value: False
units:

Second replication:

iput -K testA.txt aB2SAFE/testB.txt
[admincentos@alice-centos ~]$ irule -F testRepl.r
[admincentos@alice-centos ~]$ imeta ls -C aB2SAFE
AVUs defined for collection aB2SAFE:
attribute: EUDAT/REPLICA
value: 21.T12996/9b5fd49a-9f99-11e7-8f5c-04060a6400be,
21.T12996/9B5FD49A-9F99-11E7-8F5C-04060A6400BE
units:
----
attribute: PID
value: 21.T12995/9a6d89c4-9f99-11e7-91bf-04060a6400b9
units:
----
attribute: EUDAT/FIXED_CONTENT
value: False
units:

Third replication:

[admincentos@alice-centos ~]$ iput -K testA.txt aB2SAFE/testC.txt
[admincentos@alice-centos ~]$ irule -F testRepl.r
[admincentos@alice-centos ~]$ imeta ls -C aB2SAFE
AVUs defined for collection aB2SAFE:
attribute: EUDAT/REPLICA
value: 21.T12996/9b5fd49a-9f99-11e7-8f5c-04060a6400be,
21.T12996/9B5FD49A-9F99-11E7-8F5C-04060A6400BE,
21.T12996/9B5FD49A-9F99-11E7-8F5C-04060A6400BE
units:
----
attribute: PID
value: 21.T12995/9a6d89c4-9f99-11e7-91bf-04060a6400b9
units:
----
attribute: EUDAT/FIXED_CONTENT
value: False
units:

and so forth ...

cookie33 commented 6 years ago

Tested on centos 6.8

iput testA.r aB2SAFE/testA.txt

replicate and register:

$ irule -F testA.r 
DEBUG replication to
            /alice/home/rods#bob/b2replication

check meta data

$ imeta ls -c  /bob/home/rods/aB2SAFE
AVUs defined for collection /bob/home/rods/aB2SAFE:
attribute: PID
value: 21.T12996/3c832aac-c489-11e7-a7b0-5254000df0ed
units: 
----
attribute: EUDAT/REPLICA
value: 842/3eb47ee8-c489-11e7-a2ca-525400cdee34
units: 
----
attribute: EUDAT/FIXED_CONTENT
value: False
units: 

and replicate again:

$ irule -F testA.r 
DEBUG replication to
            /alice/home/rods#bob/b2replication

And check the original again:

$ imeta ls -c  /bob/home/rods/aB2SAFE
AVUs defined for collection /bob/home/rods/aB2SAFE:
attribute: PID
value: 21.T12996/3c832aac-c489-11e7-a7b0-5254000df0ed
units: 
----
attribute: EUDAT/REPLICA
value: 842/3eb47ee8-c489-11e7-a2ca-525400cdee34,842/3eb47ee8-c489-11e7-a2ca-525400cdee34
units: 
----
attribute: EUDAT/FIXED_CONTENT
value: False
units: 

It has indeed two times EUDAT/REPLICA on the original side on the directory.

Also the original file has 2 times EUDAT/REPLICA

$ imeta ls -d  /bob/home/rods/aB2SAFE/testA.txt
AVUs defined for dataObj /bob/home/rods/aB2SAFE/testA.txt:
attribute: eudat_dpm_checksum_date:demoResc
value: 01510147525
units: 
----
attribute: EUDAT/FIXED_CONTENT
value: False
units: 
----
attribute: PID
value: 21.T12996/3f4c47dc-c489-11e7-9b90-5254000df0ed
units: 
----
attribute: EUDAT/REPLICA
value: 842/3fcde058-c489-11e7-85d2-525400cdee34,842/3fcde058-c489-11e7-85d2-525400cdee34
units:

So the problem has been reproduced.

The handle of the original objects also have 2 times the same value in EUDAT/REPLICA.

$ curl -s http://hdl.handle.net/api/handles/21.T12996/3f4c47dc-c489-11e7-9b90-5254000df0ed?pretty | grep EUDAT/REPLICA -A3
      "type": "EUDAT/REPLICA",
      "data": {
        "format": "string",
        "value": "842/3fcde058-c489-11e7-85d2-525400cdee34,842/3fcde058-c489-11e7-85d2-525400cdee34"

The problems is that a directory/file is copied 2 times to the same directory. It wants to add the EUDAT/REPLICA again. It does not check if the replica is the same. Than it should not do it.

cookie33 commented 6 years ago

We have to update this function:

EUDATUpdatePIDWithNewChild(*parentPID, *childPID) {
    *replicaNew = "None"
    logInfo("[EUDATUpdatePIDWithNewChild] update parent pid (*parentPID) with new child (*childPID)");
    getEpicApiParameters(*credStoreType, *credStorePath, *epicApi, *serverID, *epicDebug);
    *replica = EUDATGeteValPid(*parentPID, "EUDAT/REPLICA");
    if ((*replica == "") || (*replica == "None")) {
        *replicaNew = *childPID;
    }
    else {
        *replicaNew = *replica ++ "," ++ *childPID;
    }
...
    if (*response != "True") { *replicaNew = "None" }
    *replicaNew
}

If the *replica retrieved already contains the *childPID we can return the value *replica as *replicanew

Than it adds the same EUDAT/PARENT value as before in the iCAT and it should be fixed.

The following should be amended:

    else {
        *replicaNew = *replica ++ "," ++ *childPID;
    }

Here there should be an extra check if the *replica contains the *childPID. If that is so return the following: *replicaNew = *replica otherwise it will be: *replicaNew = *replica ++ "," ++ *childPID;

cookie33 commented 6 years ago

tested:

likie{
    *long_string = "this is a long string";
    *short_string = "string";
    writeLine("stdout", *long_string like "*"++*short_string);
    # Output: true
    writeLine("stdout", *long_string like *short_string++"*");
    # Output: true
    writeLine("stdout", *long_string like "i*"++*short_string++"*");
    # Output: true
    writeLine("stdout", *long_string like "*"++*short_string++"i*");
    # Output: true
    writeLine("stdout", *long_string like "*"++*short_string++"*");
    # Output: true

    if (*long_string like "*"++*short_string++"*") {
        writeLine("stdout", "OK. Done");
    }
}

input null
output ruleExecOut

output:

$ irule -F testB.r
true
false
false
false
true
OK. Done
cookie33 commented 6 years ago

It is fixed as follows:

EUDATUpdatePIDWithNewChild(*parentPID, *childPID) {
    *replicaNew = "None"
    logInfo("[EUDATUpdatePIDWithNewChild] update parent pid (*parentPID) with new child (*childPID)");
    getEpicApiParameters(*credStoreType, *credStorePath, *epicApi, *serverID, *epicDebug);
    *replica = EUDATGeteValPid(*parentPID, "EUDAT/REPLICA");
    if ((*replica == "") || (*replica == "None")) {
        *replicaNew = *childPID;
    }
    else {
        if (*replica like "*"++*childPID++"*") {
           *replicaNew = *replica;
        }
        else {
           *replicaNew = *replica ++ "," ++ *childPID;
        }
    }
...
cookie33 commented 6 years ago

See #106

cookie33 commented 6 years ago

Merged in to devel branch