cta-observatory / pyeventio

Python read-only implementation for the EventIO data format used by the CORSIKA 7 IACT extension and sim_telarray
MIT License
10 stars 12 forks source link

LaserCalibration - n_pixel read in negativ #264

Closed lheckmann closed 1 year ago

lheckmann commented 1 year ago

Bug for a custom (not CTA design) simtel array file (SIMTEL_VERSION = 2022-12-15 19:19:16 UTC (konrad@wizard4) SIMTEL_RELEASE = 2022.349.1 from 2022-12-15 (moving on ...) SIMTEL_BASE_RELEASE = 2022.349.1 from 2022-12-15 (moving on ...))

When investigating the output with: with EventIOFile('../simtel/revoll_raw.simtel.gz') as f: for eventio_object in f: if isinstance(eventio_object, LaserCalibration): print(eventio_object.parse())

I get the error:

ValueError                                Traceback (most recent call last)
Input In [5], in <cell line: 2>()
      3     for eventio_object in f:
      4         if isinstance(eventio_object, LaserCalibration):
      5 #             for subobject in eventio_object:
      6 #                 if isinstance(subobject, TelescopeEvent):
      7 #                     for subsubobject in subobject:
----> 8             print(eventio_object.parse())

File /remote/pcmagic18/heckmann/anaconda2/envs/revoll/lib/python3.9/site-packages/eventio/simtel/objects.py:1630, in LaserCalibration.parse(self)
   1623 lascal = {
   1624     'telescope_id': self.telescope_id,
   1625     'lascal_id': lascal_id,
   1626     'calib': calib,
   1627 }
   1629 if version >= 1:
-> 1630     tmp = read_array(
   1631         byte_stream, 'f4', n_gains * 2
   1632     ).reshape(n_gains, 2)
   1633     lascal['max_int_frac'] = tmp[:, 0]
   1634     lascal['max_pixtm_frac'] = tmp[:, 1]

File /remote/pcmagic18/heckmann/anaconda2/envs/revoll/lib/python3.9/site-packages/eventio/tools.py:42, in read_array(f, dtype, count)
     40     return np.array((), dtype=dtype)
     41 #print(f.read(),dtype, count, dt.itemsize)
---> 42 return np.frombuffer(f.read(count * dt.itemsize), count=count, dtype=dt)

ValueError: buffer is smaller than requested size

I investigated it and found that the cause is in eventio/simtel/objects.py:

    eventio_type = 2023

    def parse(self):
        assert_max_version(self, 3)
        self.seek(0)
        byte_stream = BytesIO(self.read())
        version = self.header.version

        n_pixels = read_short(byte_stream)

in the last line it reads in a negative number of n_pixels from my file. Changing it to read_unsigned_short solved the issue for me.

maxnoe commented 1 year ago

I can reproduce the error with the file, however, checking with the source code, the code is actually writing a signed short:

int write_simtel_laser_calib (IO_BUFFER *iobuf, LasCalData *lcd)
{
   IO_ITEM_HEADER item_header;
   int j;

   if ( iobuf == (IO_BUFFER *) NULL || lcd == NULL )
      return -1;

   item_header.type = IO_TYPE_SIMTEL_LASCAL;  /* Data type */
   item_header.version = 0;             /* Version 0 */
//   if ( lcd->max_int_frac[0] > 0. || lcd->max_pixtm_frac[0] > 0. )
//      item_header.version = 1;          /* need version 1 */
   item_header.version = 3;          /* Now using version 3, including flat-fielding correction factors */
   item_header.ident = lcd->tel_id;
   put_item_begin(iobuf,&item_header);

   put_short(lcd->num_pixels,iobuf);
   put_short(lcd->num_gains,iobuf);

So the code in pyeventio is correct.

What might have happened is that you simulated a number of pixels larger than the maximum value of short aka int16 but the number of pixels are internally stored in an int and that is then cast to int16 and written out, which, in the case the number of pixels actually fits into an uint16, would result in the correct number of pixels when read as uint16 back in.

@kbernloehr Any input on how we should proceed? I guess cameras with more than 32767 pixels were not really accounted for and this is only a first symptom?

maxnoe commented 1 year ago

@kbernloehr Checking the source code of other objects writing the number of pixels, it seems that most of them something like this:

#if ( H_MAX_PIX >= 32768 )
   item_header.version = 1;             /* Version 1 overcomes 16 bit limit */
#else
   item_header.version = 0;             /* Version 0 kept for backward compatibility */
#endif

this should probably also be done for the laser calibrations

maxnoe commented 1 year ago

@lheckmann This issue should be addressed (for now with assuming unsigned...) together with your other two issues in #267

Could you give it a try?

lheckmann commented 1 year ago

Works perfectly, thank you!

kbernloehr commented 1 year ago

I agree that the current implementation is limited to <= 32767 pixels and should be addressed for studies of cameras with way more pixels than foreseen in CTA so far (including SCT). Following the strategy used in other places, I prepared a ('git diff' based) patch that should write version 4 laser calibration blocks with the number of pixels as int32, if needed - but trying to attach the file here only got me an 'We don't support that file type' error. And I did not want to throw in the entire io_hess.c here. Would that kind of extension (and support for it in pyeventio) solve your problem? In contrast to a work-around reading in the number of pixels as an unsigned short, that would also work for more than 65535 pixels. I hope I did not miss that kind of dependency in yet another object. I admit this kind of set-up has seen very little testing, so there might be other rough spots.

kbernloehr commented 1 year ago
diff --git a/hess/io_hess.c b/hess/io_hess.c
index 0d139d5..943bb76 100644
--- a/hess/io_hess.c
+++ b/hess/io_hess.c
@@ -9971,14 +9971,21 @@ int write_simtel_laser_calib (IO_BUFFER *iobuf, LasCalData *lcd)
       return -1;

    item_header.type = IO_TYPE_SIMTEL_LASCAL;  /* Data type */
-   item_header.version = 0;             /* Version 0 */
+//   item_header.version = 0;             /* Version 0 */
 //   if ( lcd->max_int_frac[0] > 0. || lcd->max_pixtm_frac[0] > 0. )
 //      item_header.version = 1;          /* need version 1 */
-   item_header.version = 3;          /* Now using version 3, including flat-fielding correction factors */
+#if ( H_MAX_PIX >= 32768 )
+   item_header.version = 4;          /* Now using version 4 in cameras with >= 32768 pixels */
+#else
+   item_header.version = 3;          /* Otherwise using version 3, including flat-fielding correction factors */
+#endif
    item_header.ident = lcd->tel_id;
    put_item_begin(iobuf,&item_header);

-   put_short(lcd->num_pixels,iobuf);
+   if ( item_header.version >= 4 )
+      put_int32(lcd->num_pixels,iobuf);
+   else
+      put_short(lcd->num_pixels,iobuf);
    put_short(lcd->num_gains,iobuf);
    put_int32(lcd->lascal_id,iobuf);

@@ -10023,7 +10030,7 @@ int read_simtel_laser_calib (IO_BUFFER *iobuf, LasCalData *lcd)
    item_header.type = IO_TYPE_SIMTEL_LASCAL;  /* Data type */
    if ( (rc = get_item_begin(iobuf,&item_header)) < 0 )
       return rc;
-   if ( item_header.version > 3 )
+   if ( item_header.version > 4 )
    {
       fprintf(stderr,"Unsupported laser calibration version: %d.\n",
          item_header.version);
@@ -10038,7 +10045,10 @@ int read_simtel_laser_calib (IO_BUFFER *iobuf, LasCalData *lcd)
       return -1;
    }

-   np = get_short(iobuf);
+   if ( item_header.version >= 4 )
+      np = get_int32(iobuf);
+   else
+      np = get_short(iobuf);
    ng = get_short(iobuf);
    if ( (np != lcd->num_pixels && lcd->num_pixels != 0) || 
         (ng != lcd->num_gains && lcd->num_gains!= 0) )
@@ -10135,7 +10145,7 @@ int print_simtel_laser_calib (IO_BUFFER *iobuf)
    item_header.type = IO_TYPE_SIMTEL_LASCAL;  /* Data type */
    if ( (rc = get_item_begin(iobuf,&item_header)) < 0 )
       return rc;
-   if ( item_header.version > 3 )
+   if ( item_header.version > 4 )
    {
       fprintf(stderr,"Unsupported laser calibration version: %d.\n",
          item_header.version);
@@ -10144,7 +10154,10 @@ int print_simtel_laser_calib (IO_BUFFER *iobuf)
    }

    printf("\nLaser calibration for telescope %ld", item_header.ident);   
-   np = get_short(iobuf);
+   if ( item_header.version >= 4 )
+      np = get_int32(iobuf);
+   else
+      np = get_short(iobuf);
    ng = get_short(iobuf);
    printf(" (%d pixels, %d gains)", np, ng);
kbernloehr commented 1 year ago

Sorry, just wanted to close the comment preview window and not the issue ...

maxnoe commented 1 year ago

I prepared a ('git diff' based) patch that should write version 4 laser calibration blocks with the number of pixels as int32, if needed

@kbernloehr Looking at the other objects, most of them use the variable length integer (put_scount, put_vector_of_int_scount) in case the number of pixels doesn't fit short.

of course a 4 byte int would work, but maybe better opt for consistency?

kbernloehr commented 1 year ago

I know it is a mess already but that damage is done and cannot be undone without creating yet another data block version. Storing the number of pixels, either always or only in case of >= 32768 pixels, I found (writesimtel...):

------------ 4 bytes fixed length ----------

camorgan: put_int32 camsettings: put_int32 mc_pixel_moni : put_int32 pixelset: put_int32

mc_pe_sum : put_vector_of_int32 (one put_int32 per telescope)

teladc_samples : put_long teladc_sums : put_long

( laser_calib : put_int32 ?? )

------------ variable length -------------

pixcalib : put_count32 (that is an odd one, storing an int32 like an uint32)

pixel_list : put_scount

pixtime : put_scount32 telimage : put_scount32 tel_monitor : put_scount32


Without the new laser_calib one, that adds up to:

Seven fixed length for an int32, four variable length for an int32, one variable length for an uint32. Majority vote says: put_int32.

The put_vector_of_int_scount calls are not used for storing the number of pixels but for storing something else, once per pixel. That is where space can be saved. One integer per data block is hardly worth the bit shifting.

maxnoe commented 1 year ago

@kbernloehr Ok, for us here it really doesn't matter then. If you make a new release introducing the new laser calibration version, we'll implement it.