SOAP catalogue - Githubissues

bwvdnbro commented 1 year ago

This PR adds support for SOAP catalogues. SOAP (https://github.com/SWIFTSIM/SOAP) is a new tool to calculate (sub)halo properties. It uses the subhalo membership information from VELOCIraptor and its centre of potential to calculate a large set of additional properties, like subhalo masses, star formation rates, spherical overdensity radii... These are output as a VR-like catalogue file with a much clearer structure and unit metadata, similar to the SWIFT snapshot metadata.

While the new SOAP output structure would make a new SOAP reader module desirable, this would make it hard to convert from VR to SOAP in the pipeline, since that depends a lot on functionality in this module. As a (temporary?) solution, this PR instead makes it possible to replace the VR catalogue with a SOAP catalogue with minimal changes to the API used in the pipeline. The pipeline can then slowly be adapted to use the new SOAP variable names, without the need for an immediate major update.

In practice, this PR:

defines a more general Catalogue interface of which the existing VelociraptorCatalogue is one child class. A new SOAPCatalogue child class is defined to handle SOAP catalogues. load() uses the appropriate child class using a try..except block.
defines a general interface for other parts of the module and external code to interact with the catalogue, the most important part of which is catalogue.get_quantity(), which should be used to get a dataset where you would use a catalogue attribute before. This is the only change to the external API, and will/does affect code in the pipeline registration functions.
defines a new translator function that can convert old VR dataset names to new SOAP dataset names, taking into account the fact that SOAP, unlike VR, supports multidimensional datasets. Because of this translator function, it is possible to use both the VR name and the SOAP name for the same dataset.

Apart from these changes, the PR also contains some new debugging and testing code.

The PR is still missing functionality to obtain subhalo membership information from SOAP catalogues, as is possible for VR catalogues. This can be added later but is not a priority right now.

@JBorrow I imagine that you might have some thoughts on the new design. I originally tried to stick to the original attribute-based access to the catalogue datasets, but in the end having the catalogue.get_quantity() function was pretty much inevitable to deal with having multiple names for the same dataset. The SOAP output file structure maps a lot better to multi-level attributes (e.g. "SO/200_crit/TotalMass" --> so.200_crit.totalmass), but having a mapping that works for both SOAP and VR at the same time is pretty much impossible. We could go back to attribute-based access once the entire pipeline uses the new SOAP names, but that will take a while.

@EvgeniiChaikin I hope we can get this merged soon, but in case that does not happen, you can already use this branch to test the SOAP-based pipeline. I will create an appropriate PR in the pipeline-configs repository later this week.

JBorrow commented 1 year ago

This is a pretty huge PR. I don't like that we've taken away the attribute-style access, but I understand your reasoning for doing it. I will aim to review this within the next couple of weeks or so. If I don't, feel free to ping me.

bwvdnbro commented 1 year ago

Thanks! It would be good if this could be processed before I leave Leiden (in exactly 3 weeks). I would like to keep @EvgeniiChaikin in the loop, just in case there is more work to do past that point.

EvgeniiChaikin commented 1 year ago

@bwvdnbro I have already started looking into this PR and testing it. It will probably take me a couple of more days before I come up with the first comments.

But I want to say that the PR already looks very nice and well-documented so the review should not take long.

bwvdnbro commented 1 year ago

I don't think we want to have ExclusiveSphere in the x axis's label. I know that you have thought of it and this is a mechanism to handle the labels (I guess by using get_particle_property_name_conversion() in translator.py?). What is not clear to me is why in some cases the names that we specify in registration.py do not propagate to figures' labels.

Which name shows up in the label depends on the order in which the various curves are added. unyt.matplotlib_support adds the first label it gets. If you first plot observational data, it will take the label from those. If you first plot the simulation, it will use whatever label was added to the name.

I have thought about this and I would really like to avoid using a translator function that provides a custom label for every dataset, because that is very hard to maintain. But it has also turned out to be pretty hard to come up with a generic way to convert the dataset name into a label (which is what is happening now). I actually like having the full dataset path, since then it is immediately clear what SOAP quantity we use: in this case the exclusive 3D aperture with a 50 kpc radius. If you omit the "ExclusiveSphere", then it is not clear whether you use a projected or a 3D aperture, and whether or not unbound particles are excluded or not. Omitting the 50 kpc is even more confusing. You would not put this in a paper, but for a pipeline plot that seems pretty useful to me.

bwvdnbro commented 1 year ago

I think I have addressed most of the comments. As soon as the cosmology fix for swiftsimio is confirmed, I will port that over. The other discussions can keep going, but I can't promise I will have any time left to address them.

MatthieuSchaller commented 1 year ago

Regarding the cosmology thing above, the latest swiftsimio can now correctly handle things.

JBorrow commented 1 year ago

@EvgeniiChaikin do you have time to include these changes in SOAP?

EvgeniiChaikin commented 1 year ago

@JBorrow yes, I will be able to have a look in the coming days.

JBorrow commented 1 year ago

Are we ready to go on this now?

MatthieuSchaller commented 1 year ago

Can we maybe test it first? @EvgeniiChaikin do you maybe have a soap-ified VR catalog of a COLIBRE run to test all this?

EvgeniiChaikin commented 1 year ago

Yes, I tested this branch before.

But since there have been some new changes, I presume it is better to wait until cosma is back and then test this branch again on one of the most recent runs.

JBorrow commented 1 year ago

Nice bug fix in the 1D velocity dispersion. Let me know when this is ready for merge.

EvgeniiChaikin commented 1 year ago

I have done several tests. Here is one of them:

https://swift.dur.ac.uk/COLIBREPlots/chaikin/comparisons/YEAR2022/L50N376/z2.0_VR_vs_SOAP_[U102VR,U102SOAP]/index.html

where I compare VR and SOAP output made for the same simulation.

There are about 4 plots where SOAP and VR still differ, but they probably all have to do with dust depletion. @james-trayford I have seen that you recently made some changes in the swift code, which may resolve the discrepancies we see here. Do you know if we need to also update anything in this PR that is related to dust depletion?

MatthieuSchaller commented 1 year ago

I am happy to merge this too.

Just to confirm, if we really stop using VR altogether then we will not have to expand the big list in the translator object?

MatthieuSchaller commented 1 year ago

Do you know if we need to also update anything in this PR that is related to dust depletion?

Let's do this separately.

EvgeniiChaikin commented 1 year ago

@MatthieuSchaller I have tested SOAP again, now including the recent updates on the swift/colibre side from @james-trayford.

The gas metallicities plots now all look good to me. (You can see them yourselves here https://swift.dur.ac.uk/COLIBREPlots/chaikin/individual_runs/YEAR2022/z0.0_SOAP_TEST_MEATLS/#820371183054460454 where it is clear that metal depletion is accounted for)

However, while testing, I identified a new problem: the field with the linear average of Iron from SNIa over H, which was added relatively recently here https://github.com/SWIFTSIM/pipeline-configs/commit/06c0d1bf36349a18b89aaa9ba198017883f47d97, is not registered in SOAP

The pipeline errs with

velociraptor-python/velociraptor/catalogue/translator.py", line 1402, in VR_to_SOAP
NotImplementedError: No SOAP analogue for property lin_element_ratios_times_masses.lin_FeSNIa_over_H_times_star_mass_50_kpc!

I overlooked this before because the SOAP branch of the pipeline configs wasn't up to date with the most recent master branch, so it did not include the update https://github.com/SWIFTSIM/pipeline-configs/commit/06c0d1bf36349a18b89aaa9ba198017883f47d97

To fix this, we need to add this one missing field to SOAP.

If I comment out this field, everything works perfectly!

EvgeniiChaikin commented 1 year ago

@MatthieuSchaller the commit I have just added fixes the error that I described above.

I also created a pull request in the SOAP repo: https://github.com/SWIFTSIM/SOAP/pull/12, which creates the field that I have just put in the translator file.

With these two changes, the SOAP pipeline finally works!

MatthieuSchaller commented 1 year ago

Just to confirm: You are happy with this to be merged?

EvgeniiChaikin commented 1 year ago

Yes, I am happy if this is merged.

One thing to remember: https://github.com/SWIFTSIM/pipeline-configs/pull/233 and https://github.com/SWIFTSIM/SOAP/pull/12 need to be merged at the same time in order to not break anything.

JBorrow commented 1 year ago

Can we merge this branch before merging those other two?

EvgeniiChaikin commented 1 year ago

Yes, I agree that this should be merged first.

Note that the two commits that I added today are to make the velociraptor-comparison-data repository (https://github.com/SWIFTSIM/velociraptor-comparison-data) work with this version of the velociraptor-python library for relatively new versions of python, which would otherwise complain about escape sequence characters such as \d.

I haven't seen this problem before because I didn't run the convert.py script in the velociraptor-comparison-data repository with this version of the velociraptor-python library.

JBorrow commented 1 year ago

I pushed a new release https://github.com/SWIFTSIM/velociraptor-python/releases/tag/v0.16.0

SWIFTSIM / velociraptor-python

SOAP catalogue #95