afni / afni

Official AFNI source and documentation
https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/
158 stars 105 forks source link

Test data needs updating #266

Closed YurBoiRene closed 2 years ago

YurBoiRene commented 3 years ago

Hello, I'm trying to compile and test AFNI. I have compiled it (with the normal make tools and also using CMAKE) and run tests but test_3dClusterize and test_3dClusterize_new always fail. Upon further investigation, it looks like tools.OutputDiffer compares the output of the expected and actual outputs. It looks like there is just a missing bracket (why?) but also the command echos paths and such which would change so I assume those will also fail the test. Is this intended? Is there a way I can fix this? Here is a log of test_3dClusterize_new: test-output.txt

mrneont commented 3 years ago

Hi-

I am curious where you see the missing bracket?

My understanding of the testing complaint/failure is that we actually made a change to the default behavior of 3dClusterize recently: the program will no longer default to outputting the absolute value of (ROI) Mean in the reported table (there is a new option to have that occur, though); also, the program will use any scaling that is present in the header. It is the first of these changes that is likely causing the outputs of the new code to differ from what the testing framework thinks is right: the new default output table has positive and negative values in the Mean column, as opposed to only having positive values previously. We will have to update this information in the testing framework, so it learns that this behavior is the new normal. (This means we have to ping the person who set up the CircleCI stuff...)

That being said, it would be good to know what you view as a problem in the output text there.

YurBoiRene commented 3 years ago

Thanks for the response. I'll keep that in mind. The diff that I saw that shows the bracket is line 396-410 in that file I linked.

396 | E           AssertionError: assert ('--- \n'\n '\n'\n '+++ \n'\n '\n'\n '@@ -13,7 +13,7 @@\n'\n '\n'\n ' #[ Voxel datum type    = float ]\n'\n ' #[ Voxel dimensions    = 2.000 mm X 2.000 mm X 2.000 mm ]\n'\n ' #[ Coordinates Order   = RAI ]\n'\n '-# Mean and SEM based on absolute value of voxel intensities ]\n'\n '+#[ Mean and SEM based on signed voxel intensities ]\n'\n ' #\n'\n ' #Volume  CM RL  CM AP  CM IS  minRL  maxRL  minAP  maxAP  minIS  maxIS    '\n 'Mean     SEM    Max Int  MI RL  MI AP  MI IS\n'\n ' #------  -----  -----  -----  -----  -----  -----  -----  -----  -----  '\n '-------  -------  -------  -----  -----  -----') == ''
397 | E             + --- 
398 | E             + 
399 | E             + +++ 
400 | E             + 
401 | E             + @@ -13,7 +13,7 @@
402 | E             + 
403 | E             +  #[ Voxel datum type    = float ]
404 | E             +  #[ Voxel dimensions    = 2.000 mm X 2.000 mm X 2.000 mm ]
405 | E             +  #[ Coordinates Order   = RAI ]
406 | E             + -# Mean and SEM based on absolute value of voxel intensities ]
407 | E             + +#[ Mean and SEM based on signed voxel intensities ]
408 | E             +  #
409 | E             +  #Volume  CM RL  CM AP  CM IS  minRL  maxRL  minAP  maxAP  minIS  maxIS    Mean     SEM    Max Int  MI RL  MI AP  MI IS
410 | E             +  #------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -------  -------  -------  -----  -----  -----

It looks like on lines 406 and 407 one of the outputs has the left bracket before the "mean" text while the other does not. Also the text is changed a bit (as you mentioned about the signed/absolute change). I'm guessing a bracket was missing in the old version, but this new one has the bracket fixed with new text.

Also, as a dirty fix on the local side, you can add skip_output_diff=True in the OutputDiffer parameters to disable stdout checks. That makes the test_3dClusterize_new and test_3dClusterize tests pass.

mrneont commented 3 years ago

Howdy-

Ah, I see. Thanks for noting the missing '[' in the case shown by line 406 there. That has now been fixed in this commit: https://github.com/afni/afni/commit/4d8bd9d274d3e29889da80168741ed679a7a6c07

That being said, that missing bracket is not "functional": it is just in text at the top of an output text table. It shouldn't break anything. I think the reason that line gets highlighted by CircleCI is that it is now different than it used to be: the old 3dClusterize did calculate values for "Mean and SEM based on absolute value of voxel intensities"; the recent switch means that default behavior is now to get "Mean and SEM based on signed voxel intensities". And because of that change, indeed, the line of text there changes, as do the signs of the values in the table in the "Mean" column---those combined output changes should be what are causing the unhappiness in the testing framework.

Thanks for noting the quick fix. However, this is something that we should update in CircleCI, because this is The New Normal for this code now.

leej3 commented 3 years ago

Nice work @YurBoiRene. If you are eager to learn more about this there is some documentation here.

You're interpretation of the location of the diff that is trigger this failure is correct but as @mrneont it is the whole line is different not just the bracket.

Also, your inference that setting skip_output_diff=True is correct. That may be the most appropriate path forward. A slightly more targeted fix would be to add an ignore pattern of "Mean and SEM based" which will filter that line out; there are examples in other tests of how to add ignore patterns. That change would remove the need to regenerate the "correct" data that the output is compared against each time. If there are additional differences in binary output etc. it will just get past this failure only to trigger another one...

Which brings me to the more robust/permanent/useful fix that requires more work... changing the "correct" data that is used for the comparison... it's a little tricky. As @mrneont says, the "correct" data has now been updated due to an improvement in the output. The process of updating the data in this manner is described here but VPN access to the NIH network is required, and SSH access to the afni server where all of the data is stored in a datalad repository. A valuable contribution would be to switch the datalad repository used to https://gin.g-node.org/leej3/afni_ci_test_data (or another repo hosted on that service that stores the data not just the symlinks as github does). This would remove all the issues of access to the data... i.e. people could be given push access to upload data trivially as opposed to the impossible feat of giving them push access to the afni server and make this sort of data update easier to do in the future.

It is possible that all that is required for this is updating the link here, updating the submodule in the main repo tests/afni_ci_test_data, and tidying up some documentation that makes references to the old repo. If you are interested in having a hack at that I will happily answer question as you encounter issues.

Either way it looks like you could submit a pull request that would fix the current CI failure.

YurBoiRene commented 3 years ago

@leej3, since the only files that need to be changed are the output files (log files from stdout/stderr), I believe I can just create a pull request on the afni/afni_ci_test_data repo (which I did).

I definitely agree that the process for changing test data is wonky, but I am not familiar enough with git-annex or datalad to create a solution. Correct me if I'm wrong but I think the repo you linked (https://gin.g-node.org/leej3/afni_ci_test_data) has the same problem as GitHub. It uses the same symlink system that GitHub does. If I fork the repo, the symlinked files aren't accessible.

The benefit of GitHub over just like a public drive folder in this situation is that GitHub has pull requests to review code before it's merged. Maybe we could make a system where changes to the afni/afni_ci_test_data are submitted via pull request and any new/changed test data is attached somehow (with Google Drive, dropbox, plain attachments). Then, if the pull request is approved, someone at AFNI can put the data in the private server and merge the request.

Let me know what you think.

leej3 commented 3 years ago

I believe I can just create a pull request on the afni/afni_ci_test_data repo

That will technically work but is not a desirable solution. Some of my above suggestions would be better stop gaps in the absence of someone putting a bit of time to migrate the repo to a publicly available server (or since I've already done that, simply learn how to use it and tweak the code to avail of it).

I definitely agree that the process for changing test data is wonky

I never said it was wonky :) I just incorrectly anticipated someone on the AFNI team working through the above instructions for modifying the data tree at some point. I realized my error too late. Migrating to gin would resolve that issue. While the system for managing test data is complicated, I believe it solves a difficult problem in the best way possible. There are many other constraints to the solution that I have not mentioned here but i will give a brief summary below.

The problem: the tests require a lot of data that changes over time. This data can range from publicly available datasets that we wish to use part of (and only download that part), and "expected" data output that we wish to modify over time, for example in the current situation.

The solutions, in increasing order of utility:

I appreciate it can be a steep learning curve for this stuff but any other possible solution to this problem seemed inadequate.

Correct me if I'm wrong but I think the repo you linked (https://gin.g-node.org/leej3/afni_ci_test_data) has the same problem as GitHub.

You're wrong :) Datalad/git-annex do indeed allow you to store data in the working directory as symlinks and they will show up as such on github or gin; however, while github offers no way to store the associated binary blobs gin allows you to store them on their servers. The great thing about this is that security/authentication follows the same model as github i.e. ssh keys/login credentials are used and are external to institute specifc VPNs! I think this service didn't exist when I first designed the data mangement and bespoke aws resources that I set up would have been too costly in the long-term... if gin do start to charge for their service down the line (as in drop their generous free offerings) we will benefit from the fact that data storage is cheaper by then and the system will have had time to demonstrate value both of which will help inform any decision...

If I fork the repo, the symlinked files aren't accessible.

Yes, one repo should be used for this. I've added you to it if you want to try testing things. A couple of tutorials will help extend your mental model for this stuff - you've already got an impressive handle on it from wading through the code yourself (the best way to learn!).

Maybe we could make a system where changes to the afni/afni_ci_test_data are submitted via pull request and any new/changed test data is attached somehow

I think gin can handle this workflow but i am not certain of the details.

someone at AFNI can put the data in the private server and merge the request

There is no need for a private server. That was established to provide a cheap and easy way of serving these files publicly. It turned out to be not so easy. I think switching to a public that can have pull requests submitted to it by users would be best for all. As mentioned above though the ultimate decision is not mine. I set up the system but am no longer part of the team. Their preferences will ultimately dictate what solution is possible.

YurBoiRene commented 3 years ago

Thank you for all the good information! Yes so the GIN repository hosts the git-annex'd data on the server itself whereas the GitHub repository has the git-annex'd files hosted on the afni private servers. This means anyone can contribute to changing the files on the GIN repo (git-annex'd or not). Then I have a couple questions regarding GIN.

If people cannot fork the repository, is there no way to propose a contribution than being added as a contributor?

I see you added me as contributor on the GIN repo. I have the correct data files for successful tests on my local machine. What is the best way to add this to the upstream repo? Can I create a new branch, commit to that, and create a pull request just like a normal Git workflow? Should I directly commit the changes and push to master? Is there any special command I have to use in order to stage and commit these files (related to git-annex and datalad)? Let me know what works for GIN and what you think works best workflow wise.

And yes, if I understand GIN correctly now, I agree there is no need for a private server. GIN sounds pretty cool.

Once we can get the updated data to the GIN repo I can create a pull request on this repo to change the test data repository to the new link at GIN. Super simple I think.

Again, I appreciate the help!

leej3 commented 3 years ago

If people cannot fork the repository, is there no way to propose a contribution than being added as a contributor?

I haven't looked into the details too much.

I have the correct data files for successful tests on my local machine. What is the best way to add this to the upstream repo?

I'm not sure what you mean by this. The best way to generate the data is within a docker container to minimize any differences from the context present on the CI. You would commit it to your local clone and then push it to the gin repo.

Can I create a new branch, commit to that, and create a pull request just like a normal Git workflow?

Also not a detail I have looked into. I didn't look into GIN much. I just discovered it as a solution to my problems a little before I left the team. Happy to take a look if you are struggling to figure it out.

Is there any special command I have to use in order to stage and commit these files (related to git-annex and datalad)?

I'd check the datalad docs for this. Roughly speaking datalad uses "save" for this sort of thing (stages and commits in a single command I believe). In afni_ci_tests_data the setup script and readme may also provide some insight too.

Let me know what works for GIN and what you think works best workflow wise.

My best guess is directly pushing to a single repo will be fine to initially prototype this as a solution to issues with updating test data. If it all works nicely an alternative strategy can be considered. As long as you make sure your commits are adding the files as symlinks to the annex and not directly to the git repo then a messy history with failed attempts is completely acceptable.

Once we can get the updated data to the GIN repo I can create a pull request on this repo to change the test data repository to the new link at GIN. Super simple I think.

Awesome. Thanks. And you would need to update the submodule too as mentioned above. that's also an easy enough tweak. Overall the readme for afni_ci_test_data should give you an idea of the various things to bear in mind.

YurBoiRene commented 3 years ago

Excellent. @leej3 I updated the test data in the GIN repo successfully (after a few tries) and changed the references in the afni source. The pull request is open here. Let me know what you think.

leej3 commented 3 years ago

Nicely done. Thank you very much. This will make test data updates far more maintainable. I pushed a fix for an unrelated issue to your branch.

@mrneont can you take a look and let us know if there are any objections from the team for this being merged. This will allow #218 to be merged (or at least the data update required for it to be merged is now trivial) and I believe #228 made us think of some tests that might be useful to add but didn't get around to at the time.

@YurBoiRene, if you can document your workflow or at least the problems/gotchas you encountered that would be helpful to others please do.

YurBoiRene commented 3 years ago

@YurBoiRene, if you can document your workflow or at least the problems/gotchas you encountered that would be helpful to others please do.

Here's a quick overview:

Issue #266 - rene's notes

Needed to:
1. Update 3dClusterize test data
2. Change afni to use GIN repo instead of GitHub repo

For 3dClusterize data
- Copy GIN repo to separate dir using ONLY DATALAD COMMANDS (e.g. datalad install)
- Get the proper files using datalad get
- Run tests
- Replace the files in cloned repo with the ran tests and save changes using datalad save
- Should work as long as no git clone or git-annex commands

For changing to GIN repo
- .git/config doesn't exist so for git-annex stuff (datalad install), omit the ".git"
- Otherwise simple
- Replace in .gitmodules and data_management.py
- Delete afni_ci_test_data and rerun tests to make sure it works

I can make it pretty and complete if you need, but I figure a dirty rundown is sufficient.

mrneont commented 3 years ago

Hi, All-

Thanks for forging ahead on this. It sounds quite useful, though I am unfamiliar with GIN at present. Is this basically like "git", but aimed at datasets instead of text files (at least in the present application; I think it is broader in general)?

It will take me a bit to absorb all of this, and might require further discussion. One initial question: what exactly are the dependencies being added here? My understanding is that GIN depends on git and git-annex. Is there anything else? (I see that datalad is mentioned here, but that is already a dependency of the testing, along with git and git-annex).

YurBoiRene commented 3 years ago

Hey @mrneont, I'll take a stab at your questions, but I hope @leej3 fact checks me.

First, from my understanding, GIN is less like git and more like GitHub (or other git hosting services like BitBucket or GitLab). GIN hosts git repositories (just like GitHub), but the key difference is that GIN also hosts data for git-annex. On GitHub, we could host all the text files, but the git-annex data had to be saved on another server (afni/NIH servers I believe is what was used). GIN allows for any contributor to not only submit changes like normal, but also upload/store git-annex data. I think it's correct to say that GIN is more broad than GitHub, but note that GIN is very focused on storing and publishing scientific data (they even have a button to obtain a DOI :eyes:).

As with the previous configuration with GitHub, this configuration also requires datalad (and thus git and git-annex). This configuration removes the need for the afni/afni_ci_test_data repo and adds the requirement for the leej3/afni_ci_test_data repo at gin.g-node.org. Hypothetically, this also removes the need for the afni/NIH server hosting of test data but much of the data is still hosted there currently.

leej3 commented 3 years ago

The docs for GIN itself might be useful to get a sense of it but @YurBoiRene's description is correct/succinct and adequate for our purposes. To reiterate: the GIN platform is a way for one to host git-annex repos with all the data included. There are a number of ways of talking to the GIN server and Datalad is one of them... there is some nice documentation on the process.

Hypothetically, this also removes the need for the afni/NIH server hosting of test data but much of the data is still hosted there currently.

I think this would be best to make test data management as simple as possible. Keeping track of where the data is across different remotes will add unnecessary challenges to the process. If there was large demand on the server we might consider it but I think circleci likely represents the most consistent demand on this data server and it has caching implemented so the absolute amount of data being downloaded is quite low.

mrneont commented 2 years ago

I will close this for now, because the test data framework has been updated+fixed---many thanks again @yarikoptic---so we are able to update things in the current framework.

The idea of using GIN has appealing features, but I don't think we have the bandwidth to fully explore, test, verify, adopt and set this up at the moment. This may be something to revisit in the future.