cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
63 stars 267 forks source link

Metadata parameters to include in Prod6 #1853

Closed orelgueta closed 1 year ago

orelgueta commented 2 years ago

@kosack suggested that we use the existing sim_telarray metadata parameters to improve the telescope identifiers in Prod6 so we don't have to guess telescopes anymore. The suggestion is to use the parameters OPTICS_CONFIG_NAME, OPTICS_CONFIG_VARIANT, CAMERA_CONFIG_NAME and CAMERA_CONFIG_VARIANT when producing the configuration files for Prod6 in order to include enough information for ctapipe.

Based on this suggestion, we suggested the definitions in the table below

image

These should cover all telescope names and variants for now.

Now the question is what additional information would be useful for ctapipe (or in general the analysis chain) that we can add. We can modify the table above to include additional information and/or add additional meta-parameters to the sim_telarray configuration files. Those can already be read by pyeventio (thanks to @maxnoe).

@kosack suggested to look at the InstrumentDescription to get ideas for what else to include. He provided the two tables below.

image

image

Some of the entries in those table will be redundant (name/type), some are already provided as normal parameters. Others can perhaps be added as meta-parameters. Does anyone have suggestions on how we can improve these tables and naming schemes by making use of the metadata parameters in sim_telarray?

maxnoe commented 2 years ago

@orelgueta Something we right now rely on via guessing is the the wether the telescope is single or dual mirror.

orelgueta commented 2 years ago

OK, but if the name of the telescope is given in the metadata, do you still need the number of mirrors?

maxnoe commented 2 years ago

Yes! If it is not there, we would still need some lookup table from name to properties. I want to avoid that entirely if possible:

https://github.com/cta-observatory/ctapipe/blob/70f29b1c6b9be1366721609454b03d788419ce01/ctapipe/instrument/guess.py#L18-L34

orelgueta commented 2 years ago

OK, no problem including it then. If we add to the OPTICS_CONFIG_VARIANT "single mirror" or "dual mirror" would it make it too long or not easily machine readable? Otherwise we will simply introduce an additional meta parameter, e.g., n_mirrors.

kosack commented 2 years ago

Might be better to use the actual optics design name if there is one, e.g. Swartzschild-Couder vs Davies-Cotton, vs whatever LST uses (Parabolic Davies Cotton?) .

The main use case is that for things like Muon parameter determination, the actual layout of the telescope mirrors can be important (there even having the mirror facet positions could be useful, since the hole in the center has an effect), and there are likely also be algorithm differences in astrometric corrections that differ between single and dual-mirror telescopes. So in the end you want to know what kind of optics system the telescope has, and how many mirrors it has.

kosack commented 2 years ago

Also, Please don't invent new names for cameras: there is no "SSTCam", the cameras used on SSTs are "CHEC" or perhaps in some cases "ASTRICam". The variant names should also be somthing the camera teams provide: for example CHEC has so far 2 variants "S"(with SiPMs) and "M" (with MA-PMTs) , but I guess "M" is not longer used. So perhaps using the variant for a revision number or similar wouuld be useful? We know the cameras will have updates...

orelgueta commented 2 years ago

Might be better to use the actual optics design name if there is one, e.g. Swartzschild-Couder vs Davies-Cotton, vs whatever LST uses (Parabolic Davies Cotton?) .

OK, no problem. I suggest to use Swartzschild-Couder, Davies-Cotton or Parabolic (that's how the LST is defined in the simulation). Do you think we can add it to the variant or do you wish to have a separate meta-parameter?

So in the end you want to know what kind of optics system the telescope has, and how many mirrors it has.

The actual file with the mirror list can be retrieved from the DB if needed of course, but I am not sure if you want to do that. The number of mirrors can be easily added as a meta-parameter if you wish to have it.

orelgueta commented 2 years ago

Also, Please don't invent new names for cameras: there is no "SSTCam", the cameras used on SSTs are "CHEC" or perhaps in some cases "ASTRICam". The variant names should also be somthing the camera teams provide: for example CHEC has so far 2 variants "S"(with SiPMs) and "M" (with MA-PMTs) , but I guess "M" is not longer used. So perhaps using the variant for a revision number or similar wouuld be useful? We know the cameras will have updates...

I don't really follow this comment. First, after the SST harmonisation, we decided not to refer to the names ASTRI and CHEC anymore but SST-Structure and SST-Camera. Not sure why you want to continue with the name CHEC. The ASTRICam and CHEC-M we will not simulate anymore. The CHEC-S is the one currently simulated. It will change slightly in the next production, but the geometry and pixel size will most likely stay the same. Only some SiPM characteristics are expected to change. Therefore, I am not sure we should include the "S" variant in the metadata, since it is expected to be the nominal one from now on. We will also have an optics_config_version and a camera_config_version so any changes which are not a "variant" will go there I assume.

We can consult with the camera team as well of course.

kosack commented 2 years ago

Ok, if that is officially decided then it's fine. It's just that if all cameras are just called "[SST,MST,LST]Cam", then there is less need to name them (at least for CTA, but in ctapipe we also support HESS, MAGIC, etc) . We still support for example Prod2 files, which have ASTRICams on SST structure, etc.

orelgueta commented 2 years ago

Yes, the Cam are not too useful anymore, with the exception of the MSTs. In the past we had many more options, which means we are progressing =)

maxnoe commented 2 years ago

The cam name was also never really needed, since the camera configuration is rather complete in the simtel array objects.

We mainly needed the number of reflector dishes for the coordinate transformations between camera and telescope frame.

The names then were only for nicer array displays etc.

So what is currently not in the main eventio payload and what is needed is the question if there are 1 or 2 reflections in the optics path.

Know I also remember there is an open issue about missing properties for the muon code: https://github.com/cta-observatory/ctapipe/issues/1252

camera radius central hole radius primary mirror diameter secondary mirror diameter Focusing distance

Nice to have

Actual mirror shapes and positions real focal length's of the individual reflectors

kosack commented 2 years ago

The main reason to have camera names was also to allow for future changes and for things like MSTs with NectarCam and FlashCam in the same file.

The same for historic telescopes: n HESS, the cameras were replaced at least once with new ones, so you really do want to distinguish between the same telescope, but different eras of their camera. For Whipple, it was even worse, as there were like 5 generations of cameras on the same optics!

Right now this does affect the output. as we name datasets by the Telescope_Description string, which is a combination of the optics type "SST", the optics version (e.g. "ASTRI") and the camera name (e.g. "CHEC"). See also #1140 which goes into some more detail about that problem.

maxnoe commented 2 years ago

Right now this does affect the output. as we name datasets by the Telescope_Description string,

@kosack We don't really do that anymore. The option by-type still exists and uses the name, but the default ist to split by tel_id. Previously the instrument was stored using this names, but this was also solved to fix the issues with the MAGIC simulations.

kosack commented 2 years ago

yes, sorry I was looking at old Protopipe data where the by_type was still used. So the only issue is name collisions, but they are not critical, but it should be clear ot a reader when opening a file with both NectarCam and FlashCam MSTs that these are separate (they shouldn't not have to look up tel_ids by eye) . I guess the requirement should be that SubarrayDescription.info() gives you something that is human-readable and useful. For example, for Prod5b, I get:

In [1]: source.subarray.info()
Subarray : MonteCarloArray
Num Tels : 84
Footprint: 0.70 km2

       Type       Count         Tel IDs
----------------- ----- -----------------------
   LST_LST_LSTCam     4 1-4
 MST_MST_FlashCam    40 20-34,48-60,64-66,76-84
MST_MST_NectarCam    40 5-19,35-47,61-63,67-75

That has some redundant info now, so maybe just having the type+camera is enough

orelgueta commented 2 years ago

camera radius

Is this the shadowing radius or the pixel layout radius? The former is included in one of the parameters (camera_body_diameter). The latter can be added, but it can be slightly different based on the definition (see example here), so the analysers need to tell us which one they need.

central hole radius

This is technically a simulation parameter. Only SCT defines it though. I assume if we set it to a real value it would be useful? For telescopes with hexagonal mirrors, I guess flat-to-flat diameter would be good?

primary mirror diameter secondary mirror diameter

Both of these are simulation parameters, but they are used only for dual-mirror telescopes. We can also fill this information for single-mirror telescopes, but again it slightly depends on the definition. Would an "outer-edge" radius as shown below be appropriate?

image

Focusing distance

Not sure what is meant exactly, but it sounds like this would be the focal length minus (or plus) the focus offset. Both of these are simulation parameters, so I assume it can be calculated.

Nice to have

Actual mirror shapes and positions real focal length's of the individual reflectors

Both of these are available in the mirror list file which could be obtained from the DB (sort of, the actual focal lengths are available only for the LSTs). It might overkill to provide the full list with every simulation file though. So unless you think it is worth it, I tend to not include this for now (but again, they are available from the DB).

maxnoe commented 2 years ago

Both of these are available in the mirror list file which could be obtained from the DB (sort of, the actual focal lengths are available only for the LSTs). It might overkill to provide the full list with every simulation file though. So unless you think it is worth it, I tend to not include this for now (but again, they are available from the DB).

That would need some muon ananalysis studies. I guess approximating the mirrors as circles is good enough for now to calculate the muon efficiencies.

Is this the shadowing radius or the pixel layout radius?

Shadowing I guess, since this is all related to the muon line integration to estimate the theoretical light distribution of the muon rings.

primary mirror diameter secondary mirror diameter Both of these are simulation parameters, but they are used only for dual-mirror telescopes. We can also fill this information for single-mirror telescopes, but again it slightly depends on the definition. Would an "outer-edge" radius as shown below be appropriate?

Having this for dual mirror telescope's is most important, since it cannot be guesstimated from the mirror area. For single mirror telescopes we guesstimate by calculating the radius from the mirror_area assuming a circle.

Focusing distance

Not sure what is meant exactly, but it sounds like this would be the focal length minus (or plus) the focus offset. Both of these are simulation parameters, so I assume it can be calculated.

Either of these parameters is ok. Either the focus distance (i.e. 10km) or the camera distance from the focal plane.

orelgueta commented 2 years ago

OK so to summarise, for now it looks like the only things we need to add are

Everything else is already included in the simulation parameters. In particular, the focusing distance can be calculated from the focal length and the focus offset which are given as parameters. For example, for the LST from a focus offset of 6.55 cm, and a focal length of 2800 cm we can get the focusing distance: (2800/6.55)*(2800 + 6.55) = 12 km.

It would be interesting to see if we can find a nice solution for reading the mirror list from the DB. It shouldn't be hard considering it is now open to the internet.

image

maxnoe commented 2 years ago

Everything else is already included in the simulation parameters.

Yes, but please make sure to add these to the METAPARAM TELESCOPE ADD, otherwise it's not really possible to retreive them programatically.

The modified table below which includes the type of mirror

The important info for the coordinate trafos is one or two mirrors. Can be add the n_mirrors attribute?

OPTICS_CONFIG_VARIANT

Is the optics config really different for the 4 LSTs in the North? I would like to keep it very clear what we put into these variables, so that we don't start guessing again.

So it should be either just parabolic for the LSTs or if you want to use _VARIANT for differentiating the different LST configs, please let's use another variable only for the reflector shape, e.g. METAPARAM TELESCOPE SET REFLECTOR_SHAPE=parabolic

orelgueta commented 2 years ago

Everything else is already included in the simulation parameters.

Yes, but please make sure to add these to the METAPARAM TELESCOPE ADD, otherwise it's not really possible to retreive them programatically.

Excuse my ignorance, but I thought you can already read any simulation parameter without the need to declare it as a meta-parameter with METAPARAM TELESCOPE ADD. For example, pyeventio reads a lot of the camera parameters like the focal_length, effective_focal_length etc. The focus_offset is not read there, but is that a limitation of pyeventio or the sim_telarray file?

I don't mind adding all of these parameters as meta-parameters as well, but that means they would "appear" twice which is wasteful and is prone to errors (depending on how Konrad decides to do it).

The important info for the coordinate trafos is one or two mirrors. Can be add the n_mirrors attribute?

You are right, I forgot to mention that. However, pyeventio already has a n_mirrors attribute for the number of mirror facets. Would changing the current n_mirrors to n_mirror_facets break things? Otherwise maybe we add a boolean attribute such as dual_mirror?

Is the optics config really different for the 4 LSTs in the North? I would like to keep it very clear what we put into these variables, so that we don't start guessing again.

Yes, the mirrors are different for all 4 LSTs in the north (and are different to the south one). The individual focal lengths for each mirror facet are provided for each LST.

So it should be either just parabolic for the LSTs or if you want to use _VARIANT for differentiating the different LST configs, please let's use another variable only for the reflector shape, e.g. METAPARAM TELESCOPE SET REFLECTOR_SHAPE=parabolic

Sure, we can add a specific shape meta-parameter for all telescopes. Those will then be

maxnoe commented 2 years ago

Excuse my ignorance, but I thought you can already read any simulation parameter without the need to declare it as a meta-parameter with METAPARAM TELESCOPE ADD. For example, pyeventio reads a lot of the camera parameters like the focal_length, effective_focal_length etc. The focus_offset is not read there, but is that a limitation of pyeventio or the sim_telarray file?

No. We can read focal_length and effective_focal_length only because they are stored in one of the data objects, the CameraSettings object: https://github.com/cta-observatory/pyeventio/blob/3fc3406624873089442d8ab0aff8e9e2a2ef6a48/eventio/simtel/objects.py#L183-L196

To be able to read arbitrary configuration parameters, Konrad introduced the new METAPARAM mechanism. But you need to explicitly include the parameters you want to store in the output METAPARM objects in the configuration using METAPARAM {GLOBAL,TELESCOPE} ADD <list of parameters to store>.

In my opinion, we should just store everything and then add custom attributes using METAPARAM {GLOBAL,TELESCOPE} SET <CUSTOM_KEY>=<CUSTOM_VALUE>

However, pyeventio already has a n_mirrors attribute for the number of mirror facets. Would changing the current n_mirrors to n_mirror_facets break things?

Yes it would, but having n_mirrors in the METAPARAM and n_mirrors in CAMERA_SETTINGS objects does not conflict. n_mirror_facets would be the better name and we should probably do the renaming. But if/when we do a breaking change, I have a long list of names that need to be improved, which would take some work, both in eventio to do the changes and then in ctapipe to adapt to the changes.

So in short, adding N_MIRRORS in the metaparams is not a problem, it doesn't conflict with what is stored under the same name in camera_settings, it's just a bit confusing that it has different meanings.

This is why I proposed to use the names you proposed with a REFLECTOR_SHAPE attribute, that would also identifiy the number of mirrors. So only adding the REFLECTOR_SHAPE would also be fine.

Yes, the mirrors are different for all 4 LSTs in the north (and are different to the south one). The individual focal lengths for each mirror facet are provided for each LST.

Ok, then let's store that in OPTICS_CONFIG_VARIANT and not any other additional information. Keep things separate.

orelgueta commented 2 years ago

OK, all clear now. Thanks for explaining the data objects point. I will discuss with Konrad the option of adding all configuration parameters plus some custom ones. If that doesn't work, I will make a list of all the parameters we discussed here and add those at least.

We will add a REFLECTOR_SHAPE meta-parameter and also an N_MIRRORS one so it's all explicit.

Once we have a final list and decision I will come back to this issue and post it so we can have a final revision prior to launching productions.

kbernloehr commented 2 years ago

While you can set meta-parameters REFLECTOR_SHAPE and N_MIRRORS to any string you want, that is not necessarily related to what was actually configured. For that we have a number of real parameters (which could be taken over as meta-parameters):

Being generous with keeping actual parameters as meta-parameters is probably not a bad idea - but doing it with every single parameter would be excessive. Some parameters only make sense at global scope and others are entirely irrelevant for some telescope types (for example analog-sum trigger parameters for a telescope with majority or digital-sum).

Having the analysis stage depend on meta-parameters where the actual parameters are already in the data stream might also not be a terribly good idea. Or smuggling MC-true parameters as meta-parameters into the analysis. The analysis should still work when dropping any meta-information along with the MC-true data blocks (except for IRF production). I would think that in later CTAO operation, the *CONFIG{NAME,VARIANT,VERSION} needs to include something that identifies the actual configuration, reproducibly, in a database, as kind of the bare minimum of meta-information needed. Keeping any actual parameters beyond that as meta-parameters is possible - and might be convenient here and there - but would be redundant.

So, before coming down with lists of meta-parameters, you need to be consider - and write down - what the purpose of each of them should be. Some of them might be needed to select specific data sets while others might contain data not otherwise preserved at all (e.g. the random number generator seeds). Redundant redundancy is perhaps not the best motivation.

orelgueta commented 2 years ago

Oh right, I forgot about MIRROR_CLASS, that already covers the reflector parameter, so we don't need it. That's very good.

I would think that in later CTAO operation, the *CONFIG{NAME,VARIANT,VERSION} needs to include something that identifies the actual configuration, reproducibly, in a database, as kind of the bare minimum of meta-information needed. Keeping any actual parameters beyond that as meta-parameters is possible - and might be convenient here and there - but would be redundant.

That is true and it is indeed the plan for the future CTAO operation, with a small exception. We might still keep the analysis necessary parameters in the simulation file also in the future to avoid the need of accessing the simulation DB in each analysis job. It could be done differently (post simulation, adding the info in the analysis preparation stage), but that decision will be taken in the future.

Specifically for Prod6, we should include all of the necessary parameters for the analysis. The only list of parameters with a clear motivation is the one @maxnoe mentioned earlier:

camera radius central hole radius primary mirror diameter secondary mirror diameter Focusing distance

Nice to have

Actual mirror shapes and positions real focal length's of the individual reflectors

We can add only those and the ones identifying the telescopes in the original table I posted. Unless the analysis team wants to come up with more parameters that they think might be necessary.

kbernloehr commented 2 years ago

The latest package under the 'Testing' location includes a configuration file 'CTA-PROD6-metaparameters.cfg' (see also in https://www.mpi-hd.mpg.de/hfm/CTA/MC/Prod6/Config/Config-Files/ ) that adds meta-parameters based on two definitions provided by each telescope configuration file, OPTICS_CLASS and TRIGGER_CLASS, which actual parameters should get exported as meta-parameters. Can, of course be extended.

Try running a short prod-6 test production like NSHOW=10 EMIN=0.2 EMAX=1.0 CSCAT=200 NSCAT=5 ./prod6_test_run Paranal gamma South 20 to get a data file including those. (And keep in mind that this is work-in-progress and may change at any point).

Actual mirror shapes and positions are in the file set by the MIRROR_LIST parameter (for the 1M type) or by PRIMARY_SEGMENTATION and SECONDARY_SEGMENTATION (for 2M, might get clipped by inner and outer mirror radii). These are all in the list, where applicable. For the randomized (1M) mirror segment focal lengths, and their random displacements and misalignments, I plan to add another data block type, similar to the recently added block about randomized pixel properties. Probably not for in time for prod-6. I don't think random mirror properties are really useful as meta-parameters - would also need to check if they are even set up at the time when the meta-parameters are wrapped up.

maxnoe commented 2 years ago

I just had a look at a file produced with the release of today (2022-06-30) and it looks pretty complete.

The only thing missing are the approximate mirror sizes / holes / shadowing for the telescopes that are setup via mirror lists. I don't know what's better, leaving it out and then having code in ctapipe handling the actual mirror lists or filling the approximate values for now.

Maybe a question for @AMWMitchell ?

maxnoe commented 2 years ago

@orelgueta Maybe the inconsistent capitalization could be fixed?

ASUM_THRESHold = 260.7 
ASUM_CLIPping = 9999 
kbernloehr commented 2 years ago

There is nothing inconsistent here. The upper-case part is, as documented, the part which is required in any abbreviated form of a configuration parameter. Since parameters can then be used in any of the valid abbreviations, in any upper, lower, or mixed case, like 'aSuM_thREshO' in the config file, you are free to convert them to all lower case or all upper case or capitalized ('Asum_Threshold', for more comfortable reading) for further processing. All you lose is the reminder that 'asum_thr' is too short for a configuration input, even though it is still unique (so far). Since there had been cases in the past where the upper-case part in a parameter definition had to be extended, after introducing a new parameter, with a name starting similarly, you are even encouraged to either convert the names for future reference or simply use them as case-insensitive text. The full names, as case-insensitive, never changed for any parameter.

AMWMitchell commented 2 years ago

Replying to Max - in my opinion it'd be better to handle this via mirror lists. In the general case / long term this is a lot more flexible. If we want to fix the values for a specific production that's also ok, but we need to remember that the values could change again in future. (which is why keeping a list might be better).

maxnoe commented 1 year ago

Discussion concluded for prod6