cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
64 stars 268 forks source link

HDF5TableWriter and data types #669

Closed calispac closed 4 years ago

calispac commented 6 years ago

Hi all! I was testing a bit the implementation of the io.HDF5TableWriter and io.HDF5TableReader.

So I came up with two issues:

  1. The writer does not seem to support str and datetime types
  2. The writer + reader does not have the "with" feature :

with HDF5TableReader('my.h5' ) as f: f.read('root/table', MyContainer)


To my understanding one has to do : 

```python
f = HDF5TableReader('my.h5')
f.read('root/table', MyContainer())
del(f)

The __del__() function in the table writer closes the file. If I make a mistake in the read() for instance:

f = HDF5TableReader('my.h5')
f.read('root/NOT_THE_CORRECT_NAME', MyContainer())
del(f)

Then the reader fails without closing the file. In principle this is not an issue if one calls this piece of code with:

python my_piece_of_code.py

But working with Jupyter Notebook did cause an issue for me. I believe the cell I executed did not close the file. So then latter when I needed the file again I could not open it.

Here is a notebook : https://nbviewer.jupyter.org/github/cta-sst-1m/digicampipe/blob/just_notebooks/notebooks/example_container_writer.ipynb

So regarding the two points:

  1. What are the types supported? And if one type is supported by the Container() why is it not supported by the TableWriter?
  2. Is an implementation of the with feature foreseen ? Did I just missed it ?

Thanks in advance

maxnoe commented 6 years ago

This is very much related to #445 and #501

And yes, your points are correct.

kosack commented 6 years ago

In principle it can support anything that pytables does - I just only implemented the bare minimum to start. Right now the type mapping is done via a type map and in come cases some transform functions (e.g. to add or strip units, etc):

PYTABLES_TYPE_MAP = {
    'float': tables.Float64Col,
    'float64': tables.Float64Col,
    'float32': tables.Float32Col,
    'int': tables.IntCol,
    'int32': tables.Int32Col,
    'int64': tables.Int64Col,
    'bool': tables.BoolCol,
}

But as @MaxNoe mentioned, we plan on moving to a more "ORM-like" model where you specify the type as a subclass of Field (e.g. Int32Field), which allows us to add specializations and mapping functions in a more clear way.

The writer does not seem to support str and datetime types

Str should be as simple as adding the type to the map, so that should be easy to implement. We don't use datetimes anywhere in ctapipe (we only support astropy.Time), but to serialize that we need to think about how to store it... It's not so clear in order to keep precision (e.g. the current CTA recommendation is 2 32-bit ints, for seconds + quarter-ns, since the UNIX epoch as a TAI timebase). Simply writing out a datetime is not sufficient.

calispac commented 6 years ago

But as @MaxNoe mentioned, we plan on moving to a more "ORM-like" model where you specify the type as a subclass of Field (e.g. Int32Field), which allows us to add specializations and mapping functions in a more clear way.

Ah yes I see thanks!

kosack commented 6 years ago

The writer + reader does not have the "with" feature

That is something that I would really like to see implemented - right now I think there is a bug in the logic since the files don't always get closed when the object is destroyed. Having a correct .close() method would allow this to work, and we can either implement a full context-manager interface, or simply use the @contextlib.closing decorator

kosack commented 6 years ago

One other problem is that now we can write multiple Containers to a table, but the reader only reads into one. Not sure if that is a major problem, since so far we don't have a good use case for the TableReader (currently we write the tables event-wise, but read them later column-wise using pandas or pytables)

calispac commented 6 years ago

See #687 and #689

maxnoe commented 4 years ago

Duplicate of #1116