galacticusorg / galacticus

The Galacticus galaxy formation model
GNU General Public License v3.0
27 stars 17 forks source link

Rename luminosity datasets? #35

Open abensonca opened 6 years ago

abensonca commented 6 years ago

Original report by Andrew Benson (Bitbucket: abensonca, GitHub: abensonca).


Currently, luminosity datasets in the output have names structured like this:

diskLuminositiesStellar:SDSS_r:rest:z0.0000

This can be confusing for lightcone outputs, as the lightcone redshift does not correspond to the redshift in the property name.

Two possibilities:

  1. Remove the "z0.0000" from the property name. This could cause problems if outputs options require multiple epoch luminosities to be output - in this case we'd need to add the redshift back into the property name.

  2. We could instead leave this as it is, but have post-processing code be intelligent such that if a user requests:

diskLuminositiesStellar:SDSS_r:rest

it knows to re-write this to:

diskLuminositiesStellar:SDSS_r:rest:z0.0000

where the redshift can be determined from the currently selected output number.

abensonca commented 6 years ago

Original comment by Alex Merson (Bitbucket: aimerson, GitHub: aimerson).


I think the second option shouldn't be that difficult to implement using regex. If we remove the redshift from the dataset name then in any python/perl post-processing the user will additionally need to specify the redshift of the snapshot to read/write the data from/to.

Although, come to think of it, for a lightcone output we would actually still have the same problem since the data is stored according to snapshot. One option would be to modify the scripts to read/write lightcone files such that all of the data is stored in a single output directory in the HDF5 file. Or at least to edit the read routines such that for a lightcone data is read from all of the "snapshot" outputs simultaneously.

Revising the format of the HDF5 file for a lightcone output will probably be significant work (especially if we were to modify the Galacticus source code), so perhaps this is something that could be simply added into a lightcone module of the analysis scripts -- a class to read in the standard format HDF5 file and create a copy with galaxies stored in a single output, maybe even grouped by sky position instead of simulation subvolume (this is a function that I often use HEALPix for).

abensonca commented 6 years ago

Original comment by Andrew Benson (Bitbucket: abensonca, GitHub: abensonca).


I think the second option shouldn't be that difficult to implement using regex. If we remove the redshift from the dataset name then in any python/perl post-processing the user will additionally need to specify the redshift of the snapshot to read/write the data from/to.

Presumably they have to do that for other properties anyway though? e.g.for diskMassStellar you'd have to specify which output you want?

There's definitely a logic to writing out the data with a different structure for lightcone runs - i.e. don't have it grouped into snapshot outputs. I see a few difficulties with this:

  1. It would require some significant modification to the code - but probably nothing that's really major, and in general I'd like to move to having the output functions have an additional layer of abstraction so that we can write out in a variety of different ways depending on what's being done.

  2. The post-processing code would have to be able to handle situations where the output is all in a single group (for lightcones) and in separate snapshot groups (for non-lightcones). This probably isn't a big problem though.

abensonca commented 6 years ago

Original comment by Alex Merson (Bitbucket: aimerson, GitHub: aimerson).


Presumably they have to do that for other properties anyway though? e.g.for diskMassStellar you'd have to specify which output you want?

Yes, that is correct -- other properties will still require a redshift. I think that this could be one of the main sources of confusion for a user working with a lightcone output.

I agree that any format change will require major modification for the main code. The python scripts could be made flexible enough to cope with different formats and load the appropriate I/O module. For now I have included a function in the python processing scripts that will return a list of the snapshot outputs that the suer can simply loop over.

abensonca commented 6 years ago

Original comment by Andrew Benson (Bitbucket: abensonca, GitHub: abensonca).


Yes, that is correct -- other properties will still require a redshift. I think that this could be one of the main sources of confusion for a user working with a lightcone output.

Agreed - to me this is why having a property

diskLuminositiesStellar:SDSS_r:rest

which the post-processing scripts understand and just add a redshift seems like a good solution - that way at least all properties behave similarly as far as the user sees - they give the name of the property and the redshift at which they want it.

I like the idea of being able to output lightcone data to groups split by HEALPix pixels. There are various lightcone enhancements to be made - but I guess we should move them to a different issue thread.