AI4S2S / lilio

Calendar generator for machine learning with timeseries data
https://lilio.readthedocs.io/en/latest/
Apache License 2.0
5 stars 1 forks source link

Improved resampling checks #39

Closed BSchilperoort closed 1 year ago

BSchilperoort commented 1 year ago

In this PR:

Additionally, I made some small changes to the ruff configuration, the tests are checked now as well, but docstyles are ignored for that folder.

BSchilperoort commented 1 year ago

Hm. Turns out that there's a problem with the traintest notebook. I think we'll have to check the xarray/pandas input data for reserved coordinate/column names, e.g.:

anchor_year, i_interval, left_bound, right_bound, is_target (rename from "target").

An error should be raised if the data already contains one or more of these

geek-yang commented 1 year ago

The checks for frequency works for most of the cases, but not for monthly calendar, for instance

time_index = pd.date_range("20181001", "20211001", freq="2M")
df = pd.DataFrame(
    data={
        "data1": np.random.random(len(time_index)),
    },
    index=time_index,
)

cal = monthly_calendar(anchor="10-15", length="1M")
cal = cal.map_to_data(df)
lilio.resample(cal, df)

This will throw the following warning messages, which indeed have nothing to do with frequency, but only NaN values.

/home/yangliu/AI4S2S/lilio/lilio/utils.py:104: FutureWarning: Units 'M', 'Y' and 'y' do not represent unambiguous timedelta values and will be removed in a future version.
  return pd.Timedelta(data_freq)
/home/yangliu/miniconda3/envs/s2spy/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/yangliu/miniconda3/envs/s2spy/lib/python3.10/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
/home/yangliu/AI4S2S/lilio/lilio/utils.py:80: UserWarning: The input data could not fully cover the calendar's intervals. Intervals without available data will contain NaN values.
  warnings.warn(

As we have provided monthly_calendar in the shorthands, it would be nice to have check for it as well.

BSchilperoort commented 1 year ago

The checks for frequency works for most of the cases, but not for monthly calendar, for instance

Thanks for spotting this bug, Yang. It turns out that the Calendar's frequency is checked correctly, but the data's frequency is not! I'll modify the check to account for monthly-freq data inputs.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

96.6% 96.6% Coverage
0.0% 0.0% Duplication