Closed maxnoe closed 2 years ago
Is there no run number available in the file itself?
No, we should read it from the filename.
Why is this not stored in the fits header?
Why is this not stored in the fits header?
I will comment with my limited knowledge, please someone correct me or bring more info. The DAQ does not have any information on what we are observing, it's just taking data. Writing metadata about the observation itself would be the role of the TCU?
What is inserting the run/subrun number into the filename then?
Is run number == obs_id ?
We'd have to think about that. As we also have subruns, probably something like:
obs_id = 10000 * run + subrun
is needed. But we must have a way to uniquely identify events!
For FACT data we count runs only within the night, so our obs_id is night * 1000 + run_id
where night is an integer like 20200101
Why is this not stored in the fits header?
Are you talking about the ZFITS R1 files written by the ZFW?
In that case the run number is written out indeed, but obviously only to the first of all files of a run. It would be great, if the run header could be repated at each begin of a new file. But that is a question to @Etienne12345 and quite some redesign of ZFW, I am afraid.
I think the OBS_ID is even more general than the current run number, the idea is it will not even change while wobbling.
You are right about metadata in general, Thomas, but as for run number and index perhaps the zfits writers (which must know them) might be able to put them inside the files. I am not really sure, and anyway this will all be revised once the TCU is there.
The run header (or rather observation configuration) should be repeated in the header of each ZFITS written during a run. The only case where that should not be true is if the ZFW is restarted in the middle of a run. If this is not the case then it is a bug !
Today (8:57am) Etienne12345 wrote:
The run header (or rather observation configuration) should be repeated in the header of each ZFITS written during a run. The only case where that should not be true is if the ZFW is restarted in the middle of a run. If this is not the case then it is a bug !
Also among different streams?
Yes. If not it must be because the cameraConfig packet is not sent to individual streams. But afaik all was good and jolly a while ago.
We'd have to think about that. As we also have subruns, probably something like:
obs_id = 1000 * run + subrun
is needed.
There is no limit on file numbers (which you call subruns). No limit; Also 10000 or 100000 would not be enough. File numbers are totally artificial and could (should) change, when data are compressed or reduced further downstream, in order to keep a sufficiently big file size for efficient storage on GRID.
But we must have a way to uniquely identify events!
Event numbers are unique within a given run. An event is uniquely identified by its run and event number.
There is a fixed number of events per file, so you should be able to calculate the file which contains an event from the event number only.
Yes. If not it must be because the cameraConfig packet is not sent to individual streams. But afaik all was good and jolly a while ago.
All right. @MaxNoe, @vuillaut, if you think it is a malfunction upstream in the data writing, please file a bug in redmine, with an example run.file number for reference.
There is a fixed number of events per file,
(for a given run :wink:)
Event numbers are unique within a given run. An event is uniquely identified by its run and event number.
Ok great, so obs_id == run_id
that makes things easier.
The run header (or rather observation configuration) should be repeated in the header of each ZFITS written during a run. The only case where that should not be true is if the ZFW is restarted in the middle of a run. If this is not the case then it is a bug !
Where should this information be?
I am looking into e.g. LST-1.4.Run01829.0000.fits.fz, at cp01:
/fefs/aswg/data/real/R0/20200117/LST-1.4.Run01829.0000.fits.fz
.
I see the primary hdu and two bintables with the "CTA" custom compression: CameraConfiguration
and Events
.
These are also the only things that show up when opening the file with the protozfitsreader:
In [3]: File('./LST-1.1.Run01829.0000.fits.fz')
Out[3]: File({'CameraConfig': Table(1xR1.CameraConfiguration), 'Events': Table(13250xR1.CameraEvent)})
Okay, it seems to be in f.CameraConfiguration.configuration_id=1829
that looks like a raelly strange place for a run id to me.
Hi, for info, the configuration id (that seems to be the run id) was already saved in event.lst.tel[1].svc.configuration_id. I wonder at this point, what should have been saved in the configuration id?
Okay, it seems to be in
f.CameraConfiguration.configuration_id=1829
that looks like a raelly strange place for a run id to me.
Agreed. It stems from CTA's proposal to discard the concept of runs (and replace it with "good-time-interval"). I agree that it sounds like "configuration_id" won't be unique, but really it is.
Sorry @Etienne12345, I am not sure to understand, could you please repeat the relations supposed to be between run_id, obs_id and configuration_id ?
LST1's configuration_id really is a run_id. I have no idea how is the obs_id linked to the configuration_id though.
Ok, thanks. Then I would suggest to keep the run_id inside the lst configuration id (as it is the case now) and not to copy it also in the obs_id, at least at the r0/r1 level. Eventually for an easy book-keeping of the data I could understand to have it (temporary) in the obs_id at DL1 level (at least till the correct obs_id is also provided by TCU)
@FrancaCassol No. We need obs_id to be set to the unique key identifying events together with event_id
. This is right now for LST this run_id.
Not setting obs_id is not an option.
It's actually quite simple. obs_id
and event_id
need to identify an event. What is that combination for LST data? This has to be filled in the event source.
At which data level you need it?
All data levels
It is the event identifier! It is needed everywhere.
In ctapipe 0.8, this information is stored at the toplevel of an event in the EventIndex
container.
Before it was repeated in every of the data levels.
As said above, you have already the run number in the lst container at R0 and R1 level, so you can take it and store it in the EventIndex. I don't understand where is the problem?
The problem is, that the lst event source is not filling this information into the normal event container into the appropriate position.
In don't agree, the LST run number is not the obs_id. This information is now stored in the right place. To put it in the obs_id would be wrong
Why?
Beacause the run_id is not the obs_id
As said, you can decide to use it as obs_is, but at higher level (DL1) and knowing that this is provisory
obs_id is independent of data levels.
The configuration_id
can stay where it is and be copied as obs_id
already at R0 level (by ctapipe_io_lst
) for now, no?
Everything is provisory at this point anyway, but being able to keep track of events individually at every stage is indeed necessary.
Because the run_id is not the obs_id
Hi Franca, I see no objection to use the obs_id field to keep run_id. It is the closest to obs_id we have as of now, and pretty close to the final concept, I think. The final obs_id will not change when wobbling, but we do not have yet wobble observations, and when we have them we may decide not to stop the daq in between, and then run_id won't change. Am I missing something?
@vuillaut @moralejo fully agree!
It is the closest to obs_id we have as of now, and pretty close to the final concept, I think.
That's what I argued as well. But @FrancaCassol seems to disagree on a fundamental level.
As said, you can decide to use it as obs_is, but at higher level (DL1) and knowing that this is provisory
I don't think this is an option. We need to be able to identify events / runs uniquely over all data levels. In ctapipe 0.8, the obs id and event id are not stored in each datalevel anymore because of this. Also, event sources are required to provide the obs id, and setting it to -1 is not an option either, it is cheating the requirement.
So I'd argue that you need to fill in the best possible obs_id you have, and that is, as of now, the configuration_id / run_id.
Hi @moralejo
Because the run_id is not the obs_id
Hi Franca, I see no objection to use the obs_id field to keep run_id. It is the closest to obs_id we have as of now, and pretty close to the final concept, I think. The final obs_id will not change when wobbling, but we do not have yet wobble observations, and when we have them we may decide not to stop the daq in between, and then run_id won't change. Am I missing something?
In fact, I agree to use the run_id as obs_id at DL1 level. But I think at R0/R1 level (which concerns only the telescope, not DPPS) we should continue to call it with its name, and it is a configuraition_id not an obs_id. Said differently, I do not see any reason to cheat our self at R0 (i.e reader) level: there we have only a configuration_id (which is correctly stored in the data) and not an obs_id.
Hi @moralejo
Because the run_id is not the obs_id
Hi Franca, I see no objection to use the obs_id field to keep run_id. It is the closest to obs_id we have as of now, and pretty close to the final concept, I think. The final obs_id will not change when wobbling, but we do not have yet wobble observations, and when we have them we may decide not to stop the daq in between, and then run_id won't change. Am I missing something?
In fact, I agree to use the run_id as obs_id at DL1 level.
ok
But I think at R0/R1 level (which concerns only the telescope, not DPPS) we should continue to call it with its name, and it is a configuraition_id not an obs_id. Said differently, I do not see any reason to cheat our self at R0 (i.e reader) level: there we have only a configuration_id (which is correctly stored in the data) and not an obs_id.
I think there is something I am missing here. Why do you link obs_id to DPPS only? As soon as the data is taken they should already have an obs_id, or am I wrong?
Also, event sources are required to provide the obs id, and setting it to -1 is not an option either, it is cheating the requirement.
The source reads the R0 data in LST, and now the R0 data HAS obs_id = -1 , this is not cheating. If we want to cover some missing data, that must be done after not at the reading level (and actually I do not see any reason to do it before the DL0 level).
As soon as the data is taken they should already have an obs_id, or am I wrong?
Yes, they should have one!
The source reads the R0 data in LST, and now the R0 data HAS obs_id = -1
No, not if we say that right now, for the comissioning, using run_id as obs_id is appropriate. Then you would have it.
If we want to cover some missing data
I'd argue it's not missing. I am arguing that given the circumenstances, run_id is obs_id for LST and thus is available and should be used there.
I think there is something I am missing here. Why do you link obs_id to DPPS only? As soon as the data is taken they should already have an obs_id, or am I wrong?
yes, obviously, in the future each data will have an obs_id.
Then why not use run_id as obs_id for all datalevels while this is not yet available?
Because we do not need it at r0 level, so at that level it is better to give the correct value to the data. It is a question of been correct so long it is possible. In the reader we read the data, so we would like to get what is really written in the files (which is by the way correct). If we change the reader we will not be able to see (without changing the code) the correct obs_id in the data when it will be finally set. If you really think this is necessary (even if I still don't see why, I will have a deeper look to the new ctapipe structure), we can set it at R1 level after the drs4 corrections. This is the level that ACADA will see at the end.
Because we do not need it at r0 level
I disagree on that point.
The problem is, we should only get R1 or DL0 as input, but this is not the case. So right now, R0 is our DL0.
If we want to have some kind of provenance (and we do want it I think) across data levels, we need to identify events (so we need the obs_id
).
When the correct obs_id
is finally decided and set, I am sure we will not forget to update the reader - and we can open an issue to make sure no one forgets.
Is there no run number available in the file itself?