cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
62 stars 265 forks source link

astropy 6.x bug stacking empty tabke #2555

Closed mexanick closed 2 months ago

mexanick commented 2 months ago

Apparently, there's a bug in the astropy 6.X, see https://github.com/astropy/astropy/issues/16370

The issue has been bisected to appear in one of the PRs towards astropy 6.0.0 To avoid the manifestation of this bug in the ctapipe code, shall the astropy version be restricted to <6?

maxnoe commented 2 months ago

Given that all our tests are passing, including e.g. Table loader and merge tool, do we really need to?

mexanick commented 2 months ago

I don't know :) I'm affected by this bug in calibpipe, but I can of course fix it locally. However, I decided to give a heads-up here as it is possible that ctapipe functionality can also be affected.

maxnoe commented 2 months ago

I think we ran into a similar issue, but didn't think of it as a bug. We create empty tables, but already with the correct dtypes and columns setup.

What is the reason why you need to include an empty table in your stack of tables?

mexanick commented 2 months ago

I need to merge (vstack) an arbitrary number of tables that I read from files. There's no particular reason to create an empty table, it was just convenient: create an empty table, then vstack tables in the loop. So of course, this can be changed. However, since the algorithm was working before and stopped working after the ctapipe 0.21, that brought the update on the astropy versions, I decided to check. The underlying issue of conversion can be deeper, because the crash only happens when you write out ecsv files, but doesn't happen if you write out fits, however, the dtype is sort of corrupted in that case: while reading back, you will have numpy arrays and not an astropy.core.Time object.

maxnoe commented 2 months ago

It's generally a bad idea to cal vstack in a loop, as it will copy the data many times. Better use a list and stack at the end.

tables = []
for path in paths:
    tables.append(Table.read(path))
table = vstack(tables)
del tables   
mexanick commented 2 months ago

A corresponding PR in astropy was merged, the bug should be eliminated as of astropy-6.1.0. Since for the moment it doesn't manifest otherwise, I close the issue