LSSTDESC / rail_tpz

RAIL-wrapped version of a "lite" version of Matias Carrasco-Kind's TPZ tree-based photo-z code
MIT License
0 stars 0 forks source link

Inform stage doesn't work with Parquet input files #20

Open hdante opened 2 months ago

hdante commented 2 months ago

Hello, when running the inform procedure with a Parquet input file, I get the following error:

(base) [henrique.almeida@loginapl01 henrique.almeida]$ rail-train -a tpz train3.pq estimator_tpz.new.pkl 
Start: 2024-04-25 16:33:58.228075
Estimator algorithm: tpz
Bins: 301
HDF5 group name: ""
Column template for magnitude data: "mag_{band}"
Column template for error data: "magerr_{band}"
Starting setup.
Loading all program modules...
Configuring trainer...
Loading input file...
column_list None
Setup done.
Starting training.
self._parallel is mpi, number of processors we will use is 1
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:173: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[bandname][detmask] = self.config.mag_limits[bandname]
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  training_data[errname][detmask] = 1.0
/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py:174: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  training_data[errname][detmask] = 1.0
using native TPZ decision trees
Traceback (most recent call last):
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-train", line 182, in <module>
    if __name__ == '__main__': main()
                               ^^^^^^
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-train", line 173, in main
    train(cfg, ctx)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/slurm-home/bin/rail-train", line 162, in train
    ctx.trainer.inform(ctx.input)
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/informer.py", line 65, in inform
    self.run()
  File "/lustre/t1/cl/lsst/tmp/henrique.almeida/miniconda3/lib/python3.11/site-packages/rail/estimation/algos/tpz_lite.py", line 190, in run
    npdata = np.array(list(training_data.values()))
                           ^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'numpy.ndarray' object is not callable

Example Parquet file attached. train3.pq.gz

sschmidt23 commented 2 months ago

We do not currently have parquet file support in place for rail_tpz, we are exploring methods to incorporate parquet support in the future.

hdante commented 2 months ago

Hello, ok, should I cancel this issue ?

sschmidt23 commented 2 months ago

I think that we can leave the issue open, we are working on a RAIL v1.0 release, but adding parquet support is something that we're hoping to do in a near-future data release. Having this issue here will remind us that we want this functionality.