Closed mih closed 2 years ago
To clarify: we will still use datalad addurls
it does all we need. There is no point in wrapping around git annex registerurl
Sadly, not all XNAT instances provide a digest i.e. md5sum.
This type of querying is implemented now
Ultimately all files to-be-downloaded are associated with an "experiment" -- which is an acquisition for a subject within an experiment. If I browse the XNAT UI, I need to click on a
project -> subject -> experiment
to see its accession number -- on a page like thisThe experiment accession number can also be queried via the subject accession id, like so:
because an experiment is unique to the scope of a subject (as far as I can tell).
Similarly, all experiments (acquisitions for subjects in a project) can be discovered via the project accession number:
These queries together cover the two main use cases
in contrast to the current implementation all accession numbers can be determined in a single query, not successive queries.
Give a single experiment accession number, I can now get ALL files associated with it:
This is a comprehensive list, that includes multiple "resources" (named "collection" here).
Importantly, this list contains the direct URLs and a "digest" plus file "Size" in bytes. This is sufficient information for calling
git annex registerurl
:MD5E-s<size-in-bytes>--<md5sum>.<file-extension>
In summary, using one query per experiment, we can generate a complete dataset for single file access (capable of content verification) without performing any file downloads.
This should be fast ;-)