business-science / pytimetk

Time series easier, faster, more fun. Pytimetk.
https://business-science.github.io/pytimetk/
MIT License
696 stars 60 forks source link

Holiday data augmentation not working for Spain #292

Closed girdeux31 closed 4 months ago

girdeux31 commented 5 months ago

I'm using Python 3.9 and Ubuntu.

From the documentation:

import pandas as pd

dates = pd.date_range(start = '2022-12-25', end = '2023-01-05')
df = pd.DataFrame({'date': dates})

df.augment_holiday_signature(
    date_column = 'date',
    country_name = 'Spain'
)

Column holiday_name in output is full of NaN, when clearly it is Christmas time. It works with Germany, UnitedStates, Japan

Holidays library works fine:

import holidays

es = holidays.ES()
es.get('2014-01-01')
'Año nuevo'
es.get('2023-12-25')
'Navidad'
mdancho84 commented 4 months ago

The problem, if it exists, is related to the holidays package. If you search 2022, these are the holidays that are defined:

dict_items([(datetime.date(2022, 1, 1), 'Año nuevo'), (datetime.date(2022, 1, 6), 'Epifanía del Señor'), (datetime.date(2022, 4, 15), 'Viernes Santo'), (datetime.date(2022, 8, 15), 'Asunción de la Virgen'), (datetime.date(2022, 10, 12), 'Día de la Hispanidad'), (datetime.date(2022, 11, 1), 'Todos los Santos'), (datetime.date(2022, 12, 6), 'Día de la Constitución Española'), (datetime.date(2022, 12, 8), 'La Inmaculada Concepción')])

Here's what I see for 2023, which matches your Holidays output:

image