cta-observatory / cta-lstchain

LST prototype testbench chain
https://cta-observatory.github.io/cta-lstchain/
BSD 3-Clause "New" or "Revised" License
24 stars 77 forks source link

ValueError during addition of disp parameters during R0 to DL1 conversion #633

Open mexanick opened 3 years ago

mexanick commented 3 years ago

I’m trying to run the MC R0 to DL1 conversion using the most recent lstchain and have encountered a following issue. While some files pass the conversion just ok, some other issue a following error:

  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/lib/python3.7/site-packages/numpy/core/records.py", line 746, in fromrecords
    retval = sb.array(recList, dtype=descr)
ValueError: could not assign tuple of length 55 to structure with 54 fields.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/lib/python3.7/site-packages/tables/table.py", line 2221, in append
    wbufRA = numpy.rec.array(rows, dtype=self._v_dtype)
  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/lib/python3.7/site-packages/numpy/core/records.py", line 1069, in array
    return fromrecords(obj, dtype=dtype, shape=shape, **kwds)
  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/lib/python3.7/site-packages/numpy/core/records.py", line 758, in fromrecords
    _array[k] = tuple(recList[k])
ValueError: could not assign tuple of length 55 to structure with 54 fields.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/bin/convert_r0_to_dl1", line 8, in <module>
    sys.exit(main())
  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/lib/python3.7/site-packages/sim_runner/scripts/convert_r0_to_dl1.py", line 64, in main
    custom_config=config,
  File "/fefs/aswg/workspace/mykhailo.dalchenko/small_pix_cam/sw/cta-lstchain/lstchain/reco/r0_to_dl1.py", line 572, in r0_to_dl1
    add_disp_to_parameters_table(output_filename, f'dl1/event/telescope/parameters/{tel_name}', focal)
  File "/fefs/aswg/workspace/mykhailo.dalchenko/small_pix_cam/sw/cta-lstchain/lstchain/reco/r0_to_dl1.py", line 623, in add_disp_to_parameters_table
    add_column_table(tab, tables.Float32Col, 'disp_dx', disp_parameters[0].value)
  File "/fefs/aswg/workspace/mykhailo.dalchenko/small_pix_cam/sw/cta-lstchain/lstchain/io/io.py", line 947, in add_column_table
    newtable.append([tuple(list(row[:]) + [value])])
  File "/home/mykhailo.dalchenko/.conda/envs/lst-dev-03032021/lib/python3.7/site-packages/tables/table.py", line 2225, in append
    "The error was: <%s>" % (str(self), exc))
ValueError: rows parameter cannot be converted into a recarray object compliant with table '/_temp_table (Table(0,), fletcher32, shuffle, blosc:zstd(5)) '''. The error was: <could not assign tuple of length 55 to structure with 54 fields.>

This is happening at the very end of the conversion (actually augmenting existing h5 file with the disp parameters. Moreover, even with such error the affected h5 file looks ok and these additional 9 columns appear to be filled properly.

package versions: Name: lstchain Version: 0.6.3.post354+gitb7d140f Name: tables Version: 3.6.1 Name: numpy Version: 1.20.1

Example of affected file:

maxnoe commented 3 years ago

Can you provide the command you ran?

maxnoe commented 3 years ago

Looking at the current code, the results will anyway be wrong for MAGIC, since it hard codes the LST focal length to calculate the disp parameters.

mexanick commented 3 years ago

I'm testing my own tools to run the simulation chain (with flexible job submissions for different clusters), but essential snippet is this:

        r0_to_dl1_fname = r0_to_dl1_filename(i_file.rsplit('/')[-1])
        output_file = output_dir / r0_to_dl1_fname
        print(f'output dir: {output_dir}\nr0_to_dl1_filename: {r0_to_dl1_fname}\noutput file: {output_file}\n')
        r0_to_dl1.r0_to_dl1(i_file,
                            output_filename=output_file,
                            custom_config=config,
                            )

Config I took standard from this repo.

Considering the focal length, I've changed the call to

        for ii, telescope_id in enumerate(event.dl1.tel.keys()):
             focal = subarray.tel[telescope_id].optics.equivalent_focal_length
             add_disp_to_parameters_table(output_filename, f'dl1/event/telescope/parameters/{tel_name}', focal)

to avoid hardcoded telescope name and focal

maxnoe commented 3 years ago

It would be great if you could contribute such fixes to the code.

mexanick commented 3 years ago

Well, I was about to submit a PR (also including your suggestion on warning suppression due to None fields), but stuck with this very issue 😊

rlopezcoto commented 3 years ago

@mexanick thanks for sharing. As you could see, at the moment lstchain is going under some heavy coding and things for some versions may not have worked in some of the unreleased versions. Since this is MC, in principle it should not be so much affected, but as @maxnoe pointed out, it would be great to have a look to the specific command that you run to be more helpful

mexanick commented 3 years ago

I think, I've found the issue. Apparently, all four simulated LSTs have the same name and both MAGICs too. Below is the output of subarray.tels for the telescope.

{1: TelescopeDescription(type=LST, name=LST, optics=LST, camera=LSTCam), 2: TelescopeDescription(type=LST, name=LST, optics=LST, camera=LSTCam), 3: TelescopeDescription(type=LST, name=LST, optics=LST, camera=LSTCam), 4: TelescopeDescription(type=LST, name=LST, optics=LST, camera=LSTCam), 5: TelescopeDescription(type=LST, name=MAGIC, optics=MAGIC, camera=MAGICCam), 6: TelescopeDescription(type=LST, name=MAGIC, optics=MAGIC, camera=MAGICCam)}

This leads to the fact, the telname variable, which is defined as tel_name = str(subarray.tel[telescope_id])[4:] would be identical for each LST or MAGIC telescope. This creates few problems in my opinion:

  1. DL1 events parameters for different telescopes will be recorded in the same table. I think, the telescope with highest telescope_id will only be recorded for a particular event.
  2. This causes the error I initially reported. Since the table is the same, trying to extend it twice with the new column, which has the same name, fails, because in this case new column name won't be added (such column already exist), but the new value will be, and then the list of values will be 1 item longer than the list of columns.

A possible fix would be to use smth like tel_name = f'{str(subarray.tel[telescope_id])[4:]}-{telescope_id}' I tested it for R0 to DL1 conversion, but not sure how it can affect subsequent stages and whether such fix is desired. Please advise how to proceed

maxnoe commented 3 years ago

We shouldn't rely on these names at all. There are several issues and bugs related to this, especially to the fact that the two magics differ but have the same name.

Next version of ctapipe will switch to only using tel ids and indexes and not rely on names / guessing descriptions from parameters.

I think most of the scripts in lstchain, especially the ones used for observed rather than simulated data, only work with pure lst or even only a single telescope.

maxnoe commented 3 years ago

Probably the best solution here would be to just calculate the disp parameters also inside the telescope event loop, not all at once afterwards.

mexanick commented 3 years ago

It is not just scripts, it is in the reco module... And I believe it would only work correctly with one telescope at the moment. Considering calculating the disp parameters inside the telescope event loop - I think this may lead to a sizeable performance drop (as at the end the calculation is vectorized I think)

mexanick commented 3 years ago

Also, given that you're planning significant refactoring, a temporary fix adding the tel_id to its name which doesn't require much changes in the code might be beneficial