LinkedEarth / PyleoTutorials

Jupyter-based, science-driven tutorials for using the LinkedEarth data-software Python ecosystem
http://linked.earth/PyleoTutorials
Apache License 2.0
14 stars 7 forks source link

Age ensemble notebook #32

Closed alexkjames closed 1 year ago

alexkjames commented 1 year ago

Create notebook with examples of moving from age ensembles + series to ensemble series:

AndreaLemur commented 1 year ago

Hello,

I would like to report a possible bug in the mapAgeEnsembleToPaleoData function in this notebook. I encountered an issue with the handling of array shapes when using the np.interp function, which results in a ValueError.

To reproduce the error, here is what I did:

  1. Read data from an Excel file using the pandas library (I haven't been able to get pyLiPD to work yet, but that's a separate issue):
    
    import pandas as pd # v2.0.1
    import numpy as np # v1.24.3
    import pyleoclim as pyleo # v0.12.0

ABC1 = pd.read_excel(f'{dir}ABC1.xlsx', sheet_name=None, header=None) ensemble_values = ABC1['ensembleValues'].values paleo_values = ABC1['paleoValues'].values ensemble_depth = ABC1['ensembleDepth'].values paleo_depth = ABC1['paleoDepth'].values

2. Call the `mapAgeEnsembleToPaleoData` function:
```python
ABC1_ae = mapAgeEnsembleToPaleoData(ensemble_values, paleo_values, ensemble_depth, paleo_depth)

That produced the following error:

Traceback (most recent call last):
  Cell In[102], line 1
    ABC1_ae = mapAgeEnsembleToPaleoData(ensemble_values, paleo_values, ensemble_depth, paleo_depth)
  Cell In[100], line 44 in mapAgeEnsembleToPaleoData
    ensembleValuesToPaleo[:,i]=np.interp(paleoDepth,ensembleDepth,ensembleValues[:,i])
  File <__array_function__ internals>:200 in interp
  File ~\miniconda3\envs\spyder-env\Lib\site-packages\numpy\lib\function_base.py:1595 in interp
    return interp_func(x, xp, fp, left, right)
ValueError: object too deep for desired array

The issue lies in the assumption that the inputs 'paleoDepth', 'ensembleDepth', and 'ensembleValues' should be 1-D arrays, as required by np.interp. However, when read in from an Excel file, they are passed as 2-D arrays (e.g., paleoDepth.shape=(650, 1)).

To resolve the issue, I made the following modifications to the code (ellipses are unchanged code):

def mapAgeEnsembleToPaleoData(ensembleValues, paleoValues, ensembleDepth, paleoDepth, value_name = None,value_unit = None,time_name = None,time_unit = None):
[...]
    # Coerce paleoDepth, ensembleDepth, and paleoValues into 1-D by specifying [:, 0] slices
    for i in np.arange(0,np.shape(ensembleValues)[1]):
        ensembleValuesToPaleo[:,i]=np.interp(paleoDepth[:, 0], ensembleDepth[:, 0],ensembleValues[:,i])
    series_list = []
    for s in ensembleValuesToPaleo.T:
        series_tmp = pyleo.Series(time=s, value=paleoValues[:, 0],
[...]
return ensemble

I recommend updating the code to account for this or adding a note in the docstring to clarify that the input arrays ('paleoDepth', 'ensembleDepth', and 'ensembleValues') must be 1-D arrays to work correctly with np.interp.

alexkjames commented 1 year ago

Thanks for pointing this out @AndreaLemur! In https://github.com/LinkedEarth/PyleoTutorials/commit/a438a4fb98b99cc9b68993acfef41a69769c424e I added a couple of lines to the function to attempt to coerce paleoValues, ensembleValues, and paleoDepth into 1D arrays, as well as some additional error handling in case this fails. The docstring has been clarified as well.