Closed bwvdnbro closed 1 year ago
This is a pretty huge PR. I don't like that we've taken away the attribute-style access, but I understand your reasoning for doing it. I will aim to review this within the next couple of weeks or so. If I don't, feel free to ping me.
Thanks! It would be good if this could be processed before I leave Leiden (in exactly 3 weeks). I would like to keep @EvgeniiChaikin in the loop, just in case there is more work to do past that point.
@bwvdnbro I have already started looking into this PR and testing it. It will probably take me a couple of more days before I come up with the first comments.
But I want to say that the PR already looks very nice and well-documented so the review should not take long.
I don't think we want to have
ExclusiveSphere
in the x axis's label. I know that you have thought of it and this is a mechanism to handle the labels (I guess by usingget_particle_property_name_conversion()
intranslator.py
?). What is not clear to me is why in some cases the names that we specify inregistration.py
do not propagate to figures' labels.
Which name shows up in the label depends on the order in which the various curves are added. unyt.matplotlib_support
adds the first label it gets. If you first plot observational data, it will take the label from those. If you first plot the simulation, it will use whatever label was added to the name
.
I have thought about this and I would really like to avoid using a translator function that provides a custom label for every dataset, because that is very hard to maintain. But it has also turned out to be pretty hard to come up with a generic way to convert the dataset name into a label (which is what is happening now). I actually like having the full dataset path, since then it is immediately clear what SOAP quantity we use: in this case the exclusive 3D aperture with a 50 kpc radius. If you omit the "ExclusiveSphere", then it is not clear whether you use a projected or a 3D aperture, and whether or not unbound particles are excluded or not. Omitting the 50 kpc is even more confusing. You would not put this in a paper, but for a pipeline plot that seems pretty useful to me.
I think I have addressed most of the comments. As soon as the cosmology fix for swiftsimio is confirmed, I will port that over. The other discussions can keep going, but I can't promise I will have any time left to address them.
Regarding the cosmology thing above, the latest swiftsimio
can now correctly handle things.
@EvgeniiChaikin do you have time to include these changes in SOAP?
@JBorrow yes, I will be able to have a look in the coming days.
Are we ready to go on this now?
Can we maybe test it first? @EvgeniiChaikin do you maybe have a soap-ified VR catalog of a COLIBRE run to test all this?
Yes, I tested this branch before.
But since there have been some new changes, I presume it is better to wait until cosma is back and then test this branch again on one of the most recent runs.
Nice bug fix in the 1D velocity dispersion. Let me know when this is ready for merge.
I have done several tests. Here is one of them:
where I compare VR and SOAP output made for the same simulation.
There are about 4 plots where SOAP and VR still differ, but they probably all have to do with dust depletion. @james-trayford I have seen that you recently made some changes in the swift code, which may resolve the discrepancies we see here. Do you know if we need to also update anything in this PR that is related to dust depletion?
I am happy to merge this too.
Just to confirm, if we really stop using VR altogether then we will not have to expand the big list in the translator object?
Do you know if we need to also update anything in this PR that is related to dust depletion?
Let's do this separately.
@MatthieuSchaller I have tested SOAP again, now including the recent updates on the swift/colibre side from @james-trayford.
The gas metallicities plots now all look good to me. (You can see them yourselves here https://swift.dur.ac.uk/COLIBREPlots/chaikin/individual_runs/YEAR2022/z0.0_SOAP_TEST_MEATLS/#820371183054460454 where it is clear that metal depletion is accounted for)
However, while testing, I identified a new problem: the field with the linear average of Iron from SNIa over H, which was added relatively recently here https://github.com/SWIFTSIM/pipeline-configs/commit/06c0d1bf36349a18b89aaa9ba198017883f47d97, is not registered in SOAP
The pipeline errs with
velociraptor-python/velociraptor/catalogue/translator.py", line 1402, in VR_to_SOAP
NotImplementedError: No SOAP analogue for property lin_element_ratios_times_masses.lin_FeSNIa_over_H_times_star_mass_50_kpc!
I overlooked this before because the SOAP branch of the pipeline configs wasn't up to date with the most recent master branch, so it did not include the update https://github.com/SWIFTSIM/pipeline-configs/commit/06c0d1bf36349a18b89aaa9ba198017883f47d97
To fix this, we need to add this one missing field to SOAP.
If I comment out this field, everything works perfectly!
@MatthieuSchaller the commit I have just added fixes the error that I described above.
I also created a pull request in the SOAP repo: https://github.com/SWIFTSIM/SOAP/pull/12, which creates the field that I have just put in the translator file.
With these two changes, the SOAP pipeline finally works!
Just to confirm: You are happy with this to be merged?
Yes, I am happy if this is merged.
One thing to remember: https://github.com/SWIFTSIM/pipeline-configs/pull/233 and https://github.com/SWIFTSIM/SOAP/pull/12 need to be merged at the same time in order to not break anything.
Can we merge this branch before merging those other two?
Yes, I agree that this should be merged first.
Note that the two commits that I added today are to make the velociraptor-comparison-data
repository (https://github.com/SWIFTSIM/velociraptor-comparison-data) work with this version of the velociraptor-python
library for relatively new versions of python, which would otherwise complain about escape sequence characters such as \d
.
I haven't seen this problem before because I didn't run the convert.py
script in the velociraptor-comparison-data
repository with this version of the velociraptor-python
library.
I pushed a new release https://github.com/SWIFTSIM/velociraptor-python/releases/tag/v0.16.0
This PR adds support for SOAP catalogues. SOAP (https://github.com/SWIFTSIM/SOAP) is a new tool to calculate (sub)halo properties. It uses the subhalo membership information from VELOCIraptor and its centre of potential to calculate a large set of additional properties, like subhalo masses, star formation rates, spherical overdensity radii... These are output as a VR-like catalogue file with a much clearer structure and unit metadata, similar to the SWIFT snapshot metadata.
While the new SOAP output structure would make a new SOAP reader module desirable, this would make it hard to convert from VR to SOAP in the pipeline, since that depends a lot on functionality in this module. As a (temporary?) solution, this PR instead makes it possible to replace the VR catalogue with a SOAP catalogue with minimal changes to the API used in the pipeline. The pipeline can then slowly be adapted to use the new SOAP variable names, without the need for an immediate major update.
In practice, this PR:
Catalogue
interface of which the existingVelociraptorCatalogue
is one child class. A newSOAPCatalogue
child class is defined to handle SOAP catalogues.load()
uses the appropriate child class using atry..except
block.catalogue.get_quantity()
, which should be used to get a dataset where you would use a catalogue attribute before. This is the only change to the external API, and will/does affect code in the pipeline registration functions.Apart from these changes, the PR also contains some new debugging and testing code.
The PR is still missing functionality to obtain subhalo membership information from SOAP catalogues, as is possible for VR catalogues. This can be added later but is not a priority right now.
@JBorrow I imagine that you might have some thoughts on the new design. I originally tried to stick to the original attribute-based access to the catalogue datasets, but in the end having the
catalogue.get_quantity()
function was pretty much inevitable to deal with having multiple names for the same dataset. The SOAP output file structure maps a lot better to multi-level attributes (e.g. "SO/200_crit/TotalMass" -->so.200_crit.totalmass
), but having a mapping that works for both SOAP and VR at the same time is pretty much impossible. We could go back to attribute-based access once the entire pipeline uses the new SOAP names, but that will take a while.@EvgeniiChaikin I hope we can get this merged soon, but in case that does not happen, you can already use this branch to test the SOAP-based pipeline. I will create an appropriate PR in the pipeline-configs repository later this week.