Integer casting error - Githubissues

hombit commented 3 years ago

Some weird error happened when I run a sample script to predict parsnip encoding for PS1 data

import lcdata
import parsnip

ps1_model = parsnip.load_model('ps1')
ps1_data = lcdata.read_hdf5('data/ps1.h5')  # downloaded by lcdata_download_ps1 script

predictions = ps1_model.predict_dataset(ps1_data)

IndexError                                Traceback (most recent call last)
/var/folders/1d/bl0ng6jd7lq3p4j344gc98k00000gn/T/ipykernel_69482/1408101408.py in <module>
      5 ps1_data = lcdata.read_hdf5('data/ps1.h5')  # downloaded by lcdata_download_ps1 script
      6 
----> 7 predictions = ps1_model.predict_dataset(ps1_data)

~/.virtualenvs/dr1-parsnip/lib/python3.8/site-packages/parsnip/parsnip.py in predict_dataset(self, dataset, augment)
   1236         for batch_lcs in loader:
   1237             # Run the data through the model.
-> 1238             result = self.forward(batch_lcs, to_numpy=True, sample=False)
   1239 
   1240             # Pull out the reference time and reference scale. Note that if we are

~/.virtualenvs/dr1-parsnip/lib/python3.8/site-packages/parsnip/parsnip.py in forward(self, light_curves, sample, to_numpy)
    962 
    963         # Decode the light curves
--> 964         model_spectra, model_flux = self.decode(
    965             encoding, ref_times, color, compare_data[:, 0], redshifts, band_indices
    966         )

~/.virtualenvs/dr1-parsnip/lib/python3.8/site-packages/parsnip/parsnip.py in decode(self, encoding, ref_times, color, times, redshifts, band_indices, amplitude)
    881 
    882         # Figure out the weights for each band
--> 883         band_weights = self._calculate_band_weights(redshifts)
    884         num_batches = band_indices.shape[0]
    885         num_observations = band_indices.shape[1]

~/.virtualenvs/dr1-parsnip/lib/python3.8/site-packages/parsnip/parsnip.py in _calculate_band_weights(self, redshifts)
    318         remainders = flat_locs - int_locs
    319 
--> 320         start = self.band_interpolate_weights[..., int_locs]
    321         end = self.band_interpolate_weights[..., int_locs + 1]
    322 

IndexError: index -9223372036854775808 is out of bounds for dimension 0 with size 25485

pip freeze

appnope==0.1.2
argon2-cffi==21.1.0
astro-parsnip==1.0.1
astropy==4.3.1
attrs==21.2.0
backcall==0.2.0
bleach==4.1.0
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
cycler==0.10.0
debugpy==1.4.3
decorator==5.0.9
defusedxml==0.7.1
entrypoints==0.3
extinction==0.4.6
h5py==3.4.0
idna==3.2
ipykernel==6.4.0
ipython==7.27.0
ipython-genutils==0.2.0
ipywidgets==7.6.4
jedi==0.18.0
Jinja2==3.0.1
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==7.0.2
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.1
kiwisolver==1.3.2
lcdata==1.0.0
lightgbm==3.2.1
MarkupSafe==2.0.1
matplotlib==3.4.3
matplotlib-inline==0.1.3
mistune==0.8.4
nbclient==0.5.4
nbconvert==6.1.0
nbformat==5.1.3
nest-asyncio==1.5.1
notebook==6.4.3
numexpr==2.7.3
numpy==1.21.2
packaging==21.0
pandocfilters==1.4.3
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.2
prometheus-client==0.11.0
prompt-toolkit==3.0.20
ptyprocess==0.7.0
pycparser==2.20
pyerfa==2.0.0
Pygments==2.10.0
pyparsing==2.4.7
pyrsistent==0.18.0
python-dateutil==2.8.2
PyYAML==5.4.1
pyzmq==22.2.1
qtconsole==5.1.1
QtPy==1.11.0
requests==2.26.0
scikit-learn==0.24.2
scipy==1.7.1
Send2Trash==1.8.0
six==1.16.0
sncosmo==2.6.0
tables==3.6.1
terminado==0.12.1
testpath==0.5.0
threadpoolctl==2.2.0
torch==1.9.0
tornado==6.1
tqdm==4.62.2
traitlets==5.1.0
typing-extensions==3.10.0.2
urllib3==1.26.6
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1

hombit commented 3 years ago

I found that PS1 dataset contains NaN redshift values, and NaN being casted to a long integer (here) is some weird number. Probably some type of the input data validation is needed?

kboone commented 3 years ago

Thanks for reporting this. parsnip currently requires the redshift to be known, and lcdata doesn't do any input data validation. Instead of using lcdata.read_hdf5('data/ps1.h5'), use parsnip.load_dataset('data/ps1.h5') instead. That will reject all of the light curves with NaN redshift values. I realize that this isn't documented anywhere right now. I'll update that, and add a more informative error message.

kboone commented 3 years ago

I released a new version of ParSNIP (v1.1.0) that should handle the situation that you ran into much more gracefully. Now if you try to run ParSNIP on a dataset that contains light curves without redshifts, it will prune those light curves from the dataset and output a warning suggesting that you use parsnip.load_dataset instead. This should resolve this issue.

LSSTDESC / parsnip

Integer casting error #2