Open xiamaz opened 1 year ago
@ericblanc20 @holtgrewe Input would be much appreciated
I am not sure I understand what you propose to do. I may be mistaken, but I understand that:
cubi-tk irods check
checks the internal integrity of iRODS, i.e. consistency between md5 checksums across replicates. It looks like a health check of the iRODS system.cubi-tk sodar/snappy/seasnap check
compares the md5 stored in iRODS with the local md5. Its purpose is to verify agreement between the local data and what has been stored in SODAR.In functional analysis projects, it is often valuable to be able to verify that the local analysis files (on the cluster) are identical to those stored on SODAR, especially when the analysis report had been re-run.
Thanks. The issue is that currently the checksum for any individual file is stored in both individual md5
files with the same name and in the irods metadata itself.
Given your use-cases at no point should the md5
file in irods be necessary, as it should always be better to let irods compute and store the checksum for us. E.g. irods check
should just perform https://github.com/irods/python-irodsclient#computing-and-retrieving-checksums and pipeline specific checks should compare the checksum obtained from the irods metadata against a locally computed checksum.
This is an interesting point and maybe @mikkonie can chime in on this once he's back from vacation. Why do we actually move the .md5 files into the main iRODS storage? They are only needed for landing zone validation and could be discarded afterwards as the hashsums are also stored in the iRODS metadata.
Edit: I guess there is some use in having them readily available for another check after downloading data from SODAR (especially when not using iRODS tools i.e. Davrods), but this then begs the question why they're not shown in the "List files" web view.
Currently all
check
commands for irods work against the separately storedmd5
file. This is similar to what is being done by the sodar server commands. After moving a landing zone, there should be no additional need to manually validate these files.These commands duplicate logic already contained in irods, as validation of replica checksums against the stored data is already part of irods itself.
Unless there are
sodar
independent workflows which require manual validation of uploadedmd5
files, I would propose replacing the checks with nativeirods
checksum checks in cubi-tk.This affects
irods/check
,sea-snap/check_irods
andsnappy
.