cholla-hydro / cholla

A GPU-based hydro code
https://github.com/cholla-hydro/cholla/wiki
MIT License
60 stars 32 forks source link

Python script to load distributed snapshots and Cosmology Dark Matter Only 64^3 test #383

Open bvillasen opened 3 months ago

bvillasen commented 3 months ago

Added tools to load a distributed snapshot without merging the snapshot files into a single snapshot. This is especially useful when dealing with large snapshots and duplicating the memory footprint becomes very inconvenient. The script also has the option to load only a sub-volume of the grid, where only the set of files covering the sub-volume is loaded; this is useful, for example, when plotting a slice of a very large grid that wouldn't fit in memory if the full volume is loaded. An example and some description here: python_scripts/load_cholla_snapshot_distributed.py

Added a test for a cosmological dark matter only run (only particles) for 64^3 particles on a single GPU. To run, go to tests/cosmology/dark_matter_only/64_N1 and run: sh run_test.sh

The script will: 1) compile Cholla with the correct configuration for the dark matter only run 2) download the initial conditions at redshift 100 3) run the simulation to redshift 0 4) download the reference result (redshift = 0) 5) run a Python script to compare dark matter density from the output of the simulation and the reference snapshot 6) the script returns 0 if the test passes and 1 if it fails.

@evaneschneider, let me know if this format for testing is OK, and I can add other tests for the Hydrodynamics cosmological simulation (adiabatic and with H+He chemistry).

bvillasen commented 2 months ago

I had a conversation with @bcaddy about closing this PR since there might not be value in it, which I don't agree with, but I wanted to hear other opinions from the team.

Besides the test, I think some tools could be useful for others. I have added scripts to load snapshots without having to write the snapshot into a single file, which is a big problem when dealing with large simulations. the script also has the option to load a sub-volume without loading the entire grid. From what I see, the concatenation scripts only paste all the snapshot parts into a single file. Is that right? @mabruzzo

The test script offers a simple way to add tests that can be more flexible than the current testing system, which seem to have some limitations that, according to Bob, would require a big change to the code to accommodate.

If @evaneschneider and others don't think it's appropriate to merge this PR, then I'll close it, but I wanted to hear more voices before

bcaddy commented 2 months ago

I haven't completed my code review yet but to add some clarification:

I think being able to load sub-volumes in a performant way is great. I know that's something that @mabruzzo and @evaneschneider have been working on/thinking about so looping you in seemed like a good idea. I'm not sure what the best method is but integrating with the current concatenation scripts seems like a more maintainable option to me than adding new scripts to do a similar thing.

The testing scripts in this PR only new feature is to provide the potential ability to run tests with non-default builds. Currently they can only do that with a single, hard coded, build and would require significant reworking to the scripts, testing framework, and CI pipelines to generalize and integrate them. They also rely on unversioned data hosted on dropbox instead of versioned data in our testing data repo. IMO instead, if we want to test a specific build with the current system, we can just add it to the list of defaults with a new build type file for it.

The floating point comparison method in our current system is also more robust, stable, and accurate and has multiple different comparison methods depending on what works best for a specific test. Also, these scripts duplicate much of the code in run_tests.sh and the SystemTestRunner class and make our testing scheme less clear for newcomers by adding additional code that may not be used.

The build files for Lockhart are great, always happy to see native support for more machines.

evaneschneider commented 2 months ago

Thank you both for working on this! I really appreciate the effort on both your parts.

To respond to Bruno's questions - I think there is significant value in this PR (after all, I am the one who asked for a system test of the cosmology build, and this one seems like it will work great!), and I asked @bcaddy to take a look at it to figure out what exactly we need to change to integrate it with our current testing system. I do think this test will be more effective in the long run if we can incorporate it into our automated build / test system, and the work that remains to be done in this PR is sorting out how to do that.

My understanding is that adding the build would be simple (we just have to add it to the matrix), and the main thing that needs to be changed is to put the initial conditions and reference data into our GitHub tests subdirectory, and add the cosmology system test to the list of automated tests. I have asked Bob to help with that, since he set up the testing system and as far as I know this would be the first test that requires reading in the initial conditions, so there is probably not a bread-and-butter example to follow. If I am off base about how difficult it will be to incorporate this kind of test into our existing system, then we need to take a look at accommodating testing some other way, perhaps through a script like the one included in this PR, because we do need that kind of flexibility.

I am also fine with the additional python scripts, since adding those is consistent with our past practice. That said, I think there is significantly more likelihood that others will use them if the wiki is updated to document them.

If it would be helpful to set up a Zoom call to discuss strategy once Bob has finished his code review, I am happy to do that. Again, I very much appreciate the effort on everyone's part.

bvillasen commented 2 months ago

I'm always happy to join a call to talk about Cholla ;) I'm normally free at 8 am Pacific time (except for Thursdays), some weekdays in the afternoon would also work in case 8am PT doesn't. Happy to talk about this further.

bcaddy commented 2 months ago

The test itself is a separate topic that will be in an upcoming PR. Bruno is working on one that works with the default cosmology build; I think that's a better choice since it will presumably also test the coupling of the particles to the fluid rather than just the particles.

bvillasen commented 2 months ago

Closing this for now. I guess I'll resubmit at some point with some documentation... we'll see how that goes.

evaneschneider commented 2 months ago

Apologies that I was not able to get to this immediately after Bob's code review, but I think there is still a lot of value in this PR, even if not everything is documented yet. In particular, right now there is no simple way in the public repo to load distributed data and compare it, something that the python scripts in this PR would allow. They look easy to use, and it was my intention to keep the bar for adding scripts to this directory low, so that people can share tools quickly even if they are not as polished as code entering the main source directory. I think if we summarily dismiss PRs like this, it will only lead to people sharing scripts as files over email/slack instead, which I do not think is preferable.