cyverse / gocommands

iRODS Command-line Tools written in Go
Other
28 stars 18 forks source link

gocmd put and HIERARCHY_ERROR #22

Closed jnimoth closed 9 months ago

jnimoth commented 10 months ago

Hi Illyoung, in one of the labs at our institute, we are currently using gocommands to run it periodically from a Windows device.

The command that we use is of the format:

gocmd put --diff <source> <dest> 

The idea is that this should add new files directly to the iRODS destination.

The problem is that for this machine, not all files are transferred and also after consecutive runs of gocmd put, the amount of files at the iRODS location does not change.

Today, the command was again run by hand in Powershell and we saw the error message that you can see in this picture:

gocmd_put_error

So it mentions an HIERARCHY_ERROR and then seems to stop the remaining transfer.

From my experience, this hierarchy errors usually occur when a data object is somehow stuck in a locked state, but I do not find any locked data objects in the destination iRODS collection.

See here for example:

$iquest  "SELECT COUNT(DATA_NAME) WHERE COLL_NAME LIKE '/rug/home/TeamDrive_GPortale/Vantec2000%' AND DATA_REPL_STATUS > '1'"
DATA_NAME = 0
------------------------------------------------------------

Also, I find the data object that is mentioned in the above screenshot when the error appears and it seems to have arrived normally on the system:

$ ils -L /rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/08_DetVac_ColimAir_Ph123OPEN_35kV_20mA_4AL_10s.gfrm
  saxslab@rug.      0 rootResc;rootRandy;ptB;replB;randy00;pt004;mnt_irods004      4201984 2023-09-24.03:16 & 08_DetVac_ColimAir_Ph123OPEN_35kV_20mA_4AL_10s.gfrm
        generic    /mnt/irods004/home/TeamDrive_GPortale/Vantec2000/frames/2021/08_DetVac_ColimAir_Ph123OPEN_35kV_20mA_4AL_10s.gfrm
  saxslab@rug.      1 rootResc;rootRandy;ptC;replC;randy20;pt203;mnt_irods203      4201984 2023-09-24.03:16 & 08_DetVac_ColimAir_Ph123OPEN_35kV_20mA_4AL_10s.gfrm
        generic    /mnt/irods203/home/TeamDrive_GPortale/Vantec2000/frames/2021/08_DetVac_ColimAir_Ph123OPEN_35kV_20mA_4AL_10s.gfrm

Do you have any idea/suggestion what is going wrong here?

I am currently even unsure if the issue lays with gocmmands or somewhere else.

Thanks!

iychoi commented 10 months ago

Hi,

The error is server-side, usually caused by resource server configuration and rules. I currently have no idea why you experienced the trouble. Can you grep iRODS error logs from the server when the error occured?

Thanks, Illyoung

jnimoth commented 10 months ago

I asked our system admin and he checked the logs, but he reported that he did not find anything so far except this:

Jan 26 12:13:12 pid:1435 remote addresses: [IP 1], [IP 2], [IP 3] ERROR: [rsDataObjOpen_impl:907] - [HIERARCHY_ERROR: no valid resource found for data object                          

] [error_code=[-1803000], path=[/rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/06_DetVac_ColimAir_Ph1OPEN_35kV_20mA_4AL_10s.gfrm], hierarchy=[]
Jan 26 12:13:12 pid:1367 remote addresses: [IP 1], [IP 2], [IP 3] ERROR: [rsDataObjOpen_impl:907] - [HIERARCHY_ERROR: no valid resource found for data object                          

] [error_code=[-1803000], path=[/rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/04_DetVac_ColimAir_noPh_35kV_20mA_4AL_100s.gfrm], hierarchy=[]
Authenticated
Jan 26 12:13:13 pid:1367 remote addresses: [IP 1], [IP 2], [IP 3] ERROR: [rsDataObjOpen_impl:907] - [HIERARCHY_ERROR: no valid resource found for data object                          

] [error_code=[-1803000], path=[/rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/10_DetVac_Air_Ph1v10p12h12p90_35kV_20mA_4AL_10s.gfrm], hierarchy=[]
Jan 26 12:13:13 pid:1512 remote addresses: [IP 1], [IP 2], [IP 3] ERROR: [rsDataObjOpen_impl:907] - [HIERARCHY_ERROR: no valid resource found for data object                          

] [error_code=[-1803000], path=[/rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/07_Sep28_DetVac_ColimAir_Ph1OPEN_35kV_20mA_4AL_10s.gfrm], hierarchy=[]
Jan 26 12:13:13 pid:1435 remote addresses: [IP 1], [IP 2], [IP 3] ERROR: [rsDataObjOpen_impl:907] - [HIERARCHY_ERROR: no valid resource found for data object                          

] [error_code=[-1803000], path=[/rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/09_DetVac_ColimAir_Ph1234OPEN_35kV_20mA_4AL_10s.gfrm], hierarchy=[]
Jan 26 12:13:13 pid:1599 remote addresses: [IP 1], [IP 2], [IP 3] ERROR: [rsDataObjOpen_impl:907] - [HIERARCHY_ERROR: no valid resource found for data object                          

] [error_code=[-1803000], path=[/rug/home/TeamDrive_GPortale/Vantec2000/frames/2021/08_DetVac_ColimAir_Ph123OPEN_35kV_20mA_4AL_10s.gfrm], hierarchy=[]

Not sure if this is helpful. As another note, I saw that for the sync where I saw the issue initially, there are also some stale replicas at the destination sync location. I do not think that this should be a problem, but maybe still worth mentioning.

As the error message in the initial post said something about failing to perform parallel jobs: Do you think that running the put command with the additional --single_threaded flag could have an influence? So far, we did not try to adjust that parameter for this case.

iychoi commented 10 months ago

The error log says that it cannot find the resource in the hierarchy. This is related to the server side hierarchy configuration. Can you try accessing the same file using icommands and check the error log? If error occurs, it's a configuration issue.

For sync commands, it uses bput internally. bput creates tarballs locally, transfer, then extract in the serverside. Maybe the issue is related to rules you have.

iychoi commented 9 months ago

Closing the issue due to no activity for a long time. Reopen this issue if the issue persists.