ANTsX / ANTs

Advanced Normalization Tools (ANTs)
Apache License 2.0
1.18k stars 381 forks source link

Download testing data manually #878

Open TheChymera opened 4 years ago

TheChymera commented 4 years ago

I would like to download the testing data explicitly ahead of the build command. This is done so that the building of software can be sandboxed and security risks can better be mitigated.

The way the ANTs build system seems to download its testing data, is file for file from slicer.kitware.com. These are quite a few files:

chymera@darkhost ~/src/ANTs $ for i in TestData/Data/*.md5; do cat ${i}; echo ""; done
d50be7f1c451bf832c534206842b2d37
08ee0426072d1444706889ba66988cf1
20f1223c0010b9e4c346cfee9190acb4
f37cb48710987c33a4aee3ed8abaad57
d0c7e80f3a88639c6af4a2e749f4cda4
11ea9f4d17e24ab90d8bc0b4b087d235
84cf0e06646142e0aa4c3d3324a62701
f41df9ed883e63bdd31b440fcb82be6a
c753ce2a74481486c37d101ba224b2fb
67a066125ba4132655c1ad91c34e06a4
f225299c0f092d275ce5cb41f30eed46
0cd4a0165822312ef368c8a006a5e28d
5335f839229b48e82db8861f8ea552e8
11701bbc90621d482532064dac4eb068
a396d2381eb357181ab0038f0559d9db
5eb74684e6b59c03f956368a81972e79
f5c08ab5a885182cae48ace65f1bd91f
941066d9b635e8c272e647abd6a00f9b
99f91c8c0b957cd5a678c7385f258e74
16d7b1385c3fcad2c0fe45c53123e8a3
c9449fdf994ee8b813386fe49bcd0059
46fbac6191b33e297871c364c709711c
b90fd8bff771a67013717fbc549326a0
8c4ea9d008a044c0b145125a455fa42a
68c7bda70524918d509c9edf6a0ec203
a1c5447d6937909e7d853e113a6073fc
dc52ba5f395cd1b1612148da844d78ed
ae8ed778e46b39d350108a86cba5301d
0094881420d3f4fa2be08e5dad4e98ef
035e70ac7a4ffc010d319c7ea8879567
c8e0f1fc42799b1394770036cccb562c
6c9016956c8296b32a4dbc3eb6ff8c1a
de88309aac182bfe0abaf869eb59ed81
179dc1c16d96548564f35f1132efce22
1261fc7c8ba9fae6046cf18d8f9a2f49
acdbe32b199591bca333add30cf7b547
03dd793ed073bbe0c8b19cc5f4033d2a
3cfda4cafaa89cf924411c1911e66d36
37aaa33029410941bf4affff0479fa18
8228d2a1fa139a1eb8804ceaddcdcff3
090a1086e91222800f1e9aa4105b2c0f
fa1c95f06e2ce7642f6967eb4fa0b80d
14461dbe8b8cff1266186ee7bc9ba438
8a629ee7ea32013c76af5b05f880b5c6
602bdf5fc198a86712157dee7fa17027
1e3d598f7af2d226512a5004c664915e
65caa9274f60b238365d0a675f9204cf
4d4650e16da2a1b5be7f78b1f9e32dfb
b92477360d95deb585590bc65c44d80f
8e5337c9077993f2a248bcc41a58c202
13d6e66ea47619ca43235086c343fc4a
8556f22c41488ae6da0e7a2de0f9d828

Is there any way to download them in one archive so as to not have to list them all out explicitly before I unpack the source?

Looking at the CMake files I also see this snippet:

list(APPEND ExternalData_URL_TEMPLATES
  # Local data store populated by the ITK pre-commit hook
  "file:///${${PROJECT_NAME}_SOURCE_DIR}/.ExternalData/%(algo)/%(hash)"

Is this the same testing data as ITK? I have tried to look into the https://github.com/InsightSoftwareConsortium/ITK/releases/download/v5.1b01/InsightData-5.1b01.zip archive, and it seems to not be the case.

TheChymera commented 4 years ago

Ok, so I have downloaded all of the relevant test files (some are unavailable, see the rm step below!) and the archive --- such as it is --- seems to be sufficient for the tests to pass both with and without VTK support enabled.

Please see the code below for how I generated the archive. It is available via my own webspace ( http://chymera.eu/distfiles/ants_testdata-2.3.1_p20191013.tar.xz ), though for reliability I would be very grateful if it could be hosted under some ~official ANTs domain instead.

cd /git/repo/for/ANTs
git checkout f78b2d4a382d3090230641b5ade5da28962dad04
mkdir ~/ants_testdata/
pushd ~/ants_testdata/
for i in /git/repo/for/ANTs/TestData/Data/*.md5; do A=$(cat ${i}); wget "http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=${A}" -O "${A}"; done
for i in /git/repo/for/ANTs/*.md5; do A=$(cat ${i}); wget "http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=${A}" -O "${A}"; done
grep -R xml -lZ | xargs -0 rm
tar cJf ants_testdata-2.3.1_p20191013.tar.xz *
popd
git checkout master
gdevenyi commented 4 years ago

some ~official ANTs domain instead.

kitware is the offical company which develops cmake and ITK. I think that source is more "official" than ANTs can ever offer.

Also, fixed, can close.

TheChymera commented 4 years ago

@gdevenyi When you say fixed, do you mean anything has changed on your end (e.g. have you made the missing but apparently unneeded files available), or just that it obviously works for me?

Would you be interested in packaging the test data in one archive for upcoming releases? I see those are scheduled only very infrequently, but if you're still unable to accommodate the extra work, I could gladly do it for you. I think I understand why you prefer to download files individually (not have to bump the whole archive if only one file is updated), but if download is to be handled separately from building, keeping track of that many files is much more difficult than just wrapping them in an archive.

TheChymera commented 4 years ago

Hi @gdevenyi I have again generated a one-stop downloadable archive for the ants 2.3.4 test data. Would you be interested in auto-generating and releasing something like this in the future? Wouldn't you agree that for versioned releases it makes sense to have a versioned test data archive? This aids a lot, particularly with offline (or package-manager checksum-controlled) builds.

cd /git/repo/for/ANTs
git checkout 1195345
mkdir ~/ants_testdata/
pushd ~/ants_testdata/
    for i in /git/repo/for/ANTs/TestData/Data/*.md5; do A=$(cat ${i}); wget "http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=${A}" -O "${A}"; done
    for i in /git/repo/for/ANTs/TestData/*.md5; do A=$(cat ${i}); wget "http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=${A}" -O "${A}"; done
    grep -R xml -lZ | xargs -0 rm
    tar cJf ants_testdata-2.3.4.tar.xz *
popd
git checkout master
gdevenyi commented 4 years ago

I'm not aware of the test data archive ever changing, but its probably worth at least providing one static downloadable version.

TheChymera commented 4 years ago

@gdevenyi if you want to mirror the archive I have already created, the link which we use for distribution is https://chymera.eu/distfiles/ants_testdata-2.3.4.tar.xz My server has decent uptime, but it might be more reputable if hosted on the project's github or directly with the parent company as you mentioned earlier.

TheChymera commented 1 year ago

@gdevenyi I'm trying to regenerate the test data archive, but the above script no longer works. The resources seem to have disappeared. Any ideas what could be up? Did the testdata indeed change, or is it just located elsewhere now?

cookpa commented 1 year ago

md5sum is no longer supported, we had to migrate everything to sha512 #1236, #1237.

TheChymera commented 1 year ago

@cookpa it still doesn't work, e.g.:

[deco]~/src/ANTs ❱ cat TestData/Data/*.sha512
58877e3bd2bab703e539a1f22b8964f5982765bb9dd609fc0e1ec6f3503b94d4aa338cf9970c0d1bde1f3b87ea14f98e625ed0e0919b157584370c872665a8c3
fdb18bdc8f097c6d7b649a4cc09a0344b3e9e34cb2ab0fecfa85ad6623493b345a6a05ebe4307034fdbaa13d2a54895ff422ea39624e5211e58dc9fb571f5405
d5bcdd4c68e840c7d7bcac32e68ae7a20d2896ecb78f7d3dc2512707d88092b154a81aeb62392460fbd487f60248680a5c4777e7efe430a8b6fdcdd823f222af
20942c444831f9dd71d41809acf0eac6fb351bda9a6e2d6826a645121b09d781a7feacdcd5edf4b7e9708f9e5de380bdfdfcb9f7c4b25fa1a03d506d817c3e1d
6a5395aa5619d69802304998dde7d9b44d4e768725cb0bfd3cca20666a22d51c9879558d1767564f66de82c87c3b80df5e04338e84ee1299ff2876254fd1904f
9b8a059e0cb75fb2318cd29141eea8b27cd9f0cd676bb38d12dc3385ae63c82aa89a11e8199bca41d2b7753c85e60eea8274185aabd047d71d106755b7934a60
9b3caa5afdc591763df439c37edfaaf0cb30d98ba82d8ddbc6e434f7379b0b21ca4afbc31d25e5b36fef9551abb36af3d2252fecf0ad984263f1ffda25e955ca
2943de03b0bcce9b0c5e9c8003cb0959b9bb849cc0598d27a2079f30e6d0f204a09583884233529d066c41cbfb6ec5340e0fb27eb646e828d9bcf737d8138587
ff97abe94ded1df66d6de40713470f58730d939ef8b1112a86e69aedbbaf4fbb1ccf9a129e3b70d25093c2c241676e11714a9e2cfb9a071708dc5cd838058ba0
a764b6b2e4d1b07c6ad0794973a45ed4967c86ce6dd84471000712be0bd9a5a0a405740e4af28b9909060b00ebe66a343e34d38affa719b54ac7c0180e9babd3
0f6c76fb3222490b283500bb19d9fddb63a5670af660c25a6ecb181d2bde7036c5781898ad64b93f4918004adbdf62063e4bac79462cdb708acf34331c205e03
4ca0ce32b28329ab42c0f7ef1816dc8e54978ac60573b14b75a53a93a6af42995aadd4451ddfd19fa0ab707feff1388a86fa675c558ace1906a1ff867b15281f
[deco]/tmp/lala ❱ wget http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=4ca0ce32b28329ab42c0f7ef1816dc8e54978ac60573b14b75a53a93a6af42995aadd4451ddfd19fa0ab707feff1388a86fa675c558ace1906a1ff867b15281f
[1] 359

Redirecting output to ‘wget-log’.
[deco]/tmp/lala ❱ cat wget-log
--2023-03-30 00:31:11--  http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download
Resolving slicer.kitware.com... 66.162.65.215
Connecting to slicer.kitware.com|66.162.65.215|:80... connected.
HTTP request sent, awaiting response... 404 NOT FOUND
2023-03-30 00:31:11 ERROR 404: NOT FOUND.

[1]+  Exit 8                  wget http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download

Any idea what I'm doing wrong?

I also have this somewhat improved script to create a data archive, maybe some of you might also find it useful:

#!/usr/bin/env bash

echo "This needs two arguments, the version (e.g. "2.4.3") and the path to the ANTs repo (e.g. \`~/src/ANTs/\`)."
version=$1
ants_dir=$2

pushd "${ants_dir}"
        git checkout "v${version}"
        mkdir "/tmp/ants_testdata-${version}/"
        pushd "/tmp/ants_testdata-${version}/"
            for i in "${ants_dir}/TestData/Data/"*.sha512; do A=$(cat ${i}); wget "http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=${A}" -O "${A}"; done
            for i in "${ants_dir}/TestData/"*.sha512; do A=$(cat ${i}); wget "http://slicer.kitware.com/midas3/api/rest?method=midas.bitstream.download&checksum=${A}" -O "${A}"; done
            grep -R xml -lZ | xargs -0 rm
            tar cJf "ants_testdata-${version}.tar.xz" *
        popd
        git checkout master
popd
TheChymera commented 1 year ago

Ok, I think I figured this out, the server moved. Here's the latest version of the script which seems to work:

#!/usr/bin/env bash

echo "This needs two arguments, the version (e.g. "2.4.3") and the path to the ANTs repo (e.g. \`~/src/ANTs/\`)."
version=$1
ants_dir=$2

pushd "${ants_dir}"
        git checkout "v${version}"
        mkdir "/tmp/ants_testdata-${version}/"
        pushd "/tmp/ants_testdata-${version}/"
                # server might move :(
                # https://github.com/ANTsX/ANTs/blame/09599352304559fd74aae894e54db2f1b41e88ce/CMake/ANTSExternalData.cmake#LL45C5-L45C30
                for i in "${ants_dir}/TestData/Data/"*.sha512; do
                        A=$(cat ${i})
                        wget "https://data.kitware.com:443/api/v1/file/hashsum/sha512/${A}/download" -O "${A}"
                done
                for i in "${ants_dir}/TestData/"*.sha512; do
                        A=$(cat ${i})
                        wget "https://data.kitware.com:443/api/v1/file/hashsum/sha512/${A}/download" -O "${A}"
                done
                # This was required at some point for some reason
                #grep -R xml -lZ | xargs -0 rm
                tar cJf "ants_testdata-${version}.tar.xz" *
        popd
        git checkout master
popd

But is there perhaps a nicer way to sort this out so I don't run into this entire “where is the test data” adventure whenever I update the package? The core issue is that the testdata should be known to the package manager explicitly. Auto-download by ANTs itself during build will not work due to security constraints such as network sandboxing.

Would you be interested in having this script run at release time to provide a consolidated test data archive?

gdevenyi commented 1 year ago

But is there perhaps a nicer way to sort this out so I don't run into this entire “where is the test data” adventure whenever I update the package?

This is a fundamental design feature of the kitware/ITK ecosystem and unlikely to change. Best you can hope for is an officially maintained version of the download script.

cookpa commented 1 year ago

I'm not really keen to add further steps to the release workflows at this time. Hopefully the revised script will continue to work.

I would like to replace the whole test system, but we don't have the resources right now.

gdevenyi commented 1 year ago

@TheChymera I suggest you try to upstream the "test data download script" into ITK as a support tool for packaging, in the hope it'll be maintained/updated for future changes.

TheChymera commented 1 year ago

@gdevenyi you mean into ANTs, right? This is the ANTs testdata, the ITK testdata is different, no?

gdevenyi commented 1 year ago

The ants test data uses the ITK infrastructure.

TheChymera commented 1 year ago

@gdevenyi rigt but it's not needed to package ITK, it's needed for ANTs. So it would make more sense for it to be in this repo. Any idea in what dir?