LSSTDESC / rail

Top level "umbrella" package for RAIL
MIT License
8 stars 3 forks source link

Estimation with Parquet files not working #116

Closed hdante closed 5 months ago

hdante commented 10 months ago

Bug report Hello, I'm currently unable to start estimations with parquet file, RAIL tries to open the file as HDF5 in the following method:

    @classmethod
    def _size(cls, path, **kwargs):
        return tables_io.io.getInputDataLengthHdf5(path, **kwargs)

Full backtrace follows:

(base) [henrique.almeida@loginapl01 henrique.almeida]$ rail-estimate -a bpz apollo-slurm-preprocess/1/objectTable_tract_2897_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_1_20220317T233937Z-part0.pq abcd.hdf5
Estimator algorithm: bpz
Configuration file: /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/share/rail_scripts/estimator_bpz.pkl
Bins: 301
HDF5 group name: ""
Column template for magnitude data: "mag_{band}"
Column template for error data: "magerr_{band}"
Starting setup.
Loading all program modules...
Configuring estimator...
Loading input file...
column_list None
Setup done.
Starting estimate.
Inserting handle into data store.  model: /lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/share/rail_scripts/estimator_bpz.pkl, estimate
Traceback (most recent call last):
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate", line 213, in <module>
    main()
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate", line 209, in main
    estimate(cfg, ctx)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate", line 202, in estimate
    ctx.estimator.estimate(ctx.input)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/estimator.py", line 96, in estimate
    self.run()
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/estimator.py", line 104, in run
    iterator = self.input_iterator('input')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/core/stage.py", line 338, in input_iterator
    self._input_length = handle.size(groupname=self.config.hdf5_groupname)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/core/data.py", line 146, in size
    return self._size(self.path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/core/data.py", line 223, in _size
    return tables_io.io.getInputDataLengthHdf5(path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/tables_io/ioUtils.py", line 85, in getInputDataLengthHdf5
    hg, infp = readHdf5Group(filepath, groupname)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/tables_io/ioUtils.py", line 625, in readHdf5Group
    infp = h5py.File(filepath, "r")
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/h5py/_hl/files.py", line 567, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/h5py/_hl/files.py", line 231, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 106, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

Before submitting Please check the following:

hdante commented 10 months ago

Another problem: the input_iterator() method always inserts an HDF5 group name attribute in the options dictionary:

        try:
            self.config.hdf5_groupname
        except:
            self.config.hdf5_groupname = None

It causes the following error:

Traceback (most recent call last):
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate", line 221, in <module>
    main()
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate", line 217, in main
    estimate(cfg, ctx)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-estimate", line 210, in estimate
    ctx.estimator.estimate(ctx.input)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/estimator.py", line 96, in estimate
    self.run()
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/estimator.py", line 108, in run
    for s, e, test_data in iterator:
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/tables_io/ioUtils.py", line 441, in iterPqToDataFrame
    parquet_file = pq.read_table(filepath, columns=columns, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: read_table() got an unexpected keyword argument 'groupname'
joselotl commented 5 months ago

@ztq1996 Thank you for bringing this to my attention. I added the tables_io functions to use parquet files but forgot to add them inside rail. I'm currently sending a PR to be able to use the parquet files in the estimators.