OpenGATE / Gate

Official public repository of Gate
http://www.opengatecollaboration.org
GNU Lesser General Public License v3.0
236 stars 263 forks source link

Benchmark data not available at midas3.kitware.com #200

Closed chackoms closed 5 years ago

chackoms commented 6 years ago

Seen here

Is there any way to update where cmake pulls the external data from so it doesn't fail?

djboersma commented 6 years ago

Thanks for the reminder! We have the intention to restructure the benchmarks and testing of GATE, but yes, in the meantime it should not fail in this way. The problem is also illustrated on the CDASH test robot: https://my.cdash.org/viewBuildError.php?buildid=1549190 Those 50 errors are mostly due to the failure to download the benchmark data.

djboersma commented 6 years ago

So what we need to do copy the data that used to be on midas3.kitware.com to data.kitware.com and then update the build/benchmark scripts accordingly. It looks like @SimonRit has been active in this area, his name pops up in the cdash error logs and I see that he already has an account on data.kitware.com. :-)

The midas3 site is not accessible at all anymore, so it is hard to see who uploaded the original stuff there, could also be @albertine or @sj202988 . I thought about creating a "opengate" account on data.kitware.com but it looks like all accounts on data.kitware.com are associated with people rather than with organizations.

djboersma commented 6 years ago

Actually, the benchmark test data are only ~4MB. We used the kitware repository because in earlier releases the source code included the examples with much larger files, which were retrieved with the same "external data" hook. In 2016 or 2017 the examples have been moved to a separate project named GateContrib and the big data files are stored (and retrieved) with the LFS extension of git.

A simple solution for the short term is just to include the benchmark data directly in the git source, it seems small enough that we can do that even without using "git lfs". Note that most tests will fail even when they get their data, and for some tests the outcome (success or failure) is platform dependent. That is one of the reasons to restructure the tests. However, apart from "restructuring" the tests we also need many more of them. And if/when we do that, then we will probably also add (much) more test data. So with that in mind I still prefer to store the test data on kitware. Or with LFS.

djboersma commented 6 years ago

Since no one replied, I went ahead and moved the benchmark data to data.kitware.com and changed the Gate scripts accordingly. Please do "git pull" (or a new clone) and try it out. It worked on my own system, will check on a few others. As already noted, the tests will fail, but this time not because of missing reference data. (Except if you run the "optical" tests, those data still need to be found & uploaded.)

dsarrut commented 6 years ago

thanks David ! Issue may be closed ?

SimonRit commented 6 years ago

Thanks @djboersma for taking care of this. Kitware had anticipated this change for SimonRit/RTK, I thought the Gate collaboration had given up on Midas so I did not transfer the information, sorry. I don't think it's good to start having binary data (even if it's small) as regular commits because if you start to actually do real regression tests of Gate, it will rapidly grow. We had spent quite some time to clean up the repo, let's not pollute it again. I'm not sure what it the right solution but you can also use github pages, which avoids the additional installation of git-lfs. My initial experience with girder is not very positive and a full github solution (with git-lfs or github pages) seems (intuitively) preferable.

djboersma commented 6 years ago

Hi @SimonRit ! Thanks for your feedback. I agree that we should avoid storing binary data directly in our source code tree. The current solution with girder (data.kitware.com) actually seems to work; what kind of non-positive experiences did you have?

I had a quick look at pages.github.com and it looks like a more general service for making stuff available on the web (not just data collections), but also easy to use.

We use git-lfs for GateContrib; using it for Gate as well could be considered "more consistent". If we go this way we'll have to cleanup/revise quite a bit of the cmake and benchmark script code, but we were going to do that anyway. Another thing I like about the git-lfs solution is that it reduces the number of web sites that we'll need to keep track of, just two (github and the collaboration web site).

I am tempted to just stick with kitware/girder (because it works now, and I am lazy), but if there is a good chance that we'll have to change again later on (because of girder badness) then maybe it's better to have a good look at the options and make a long term choice right now.

SimonRit commented 6 years ago

About Girder: I think Kitware has just demonstrated with Midas that we should not trust them for having the service available on the long term. I also don't like the lack of versioning which they had with Midas: one file cannot have several versions (see an example here with a new version of spectrum.mha). That being said, it's too late for those comments since you have already made the conversion effort. Since it works, I agree, let's keep it.

djboersma commented 5 years ago

I'm closing this issue, because the immediate problem was solved. We still want to change the benchmark tests, but that is outside the scope of this issue.