FBruzzesi / iso-week-date

Toolkit to work with str representing ISO Week date format
https://fbruzzesi.github.io/iso-week-date/
MIT License
6 stars 0 forks source link

`isoweek_to_datetime` not working with pl.DataFrame in a `with_columns` context #83

Closed pietroppeter closed 1 month ago

pietroppeter commented 2 months ago

I am trying to use isoweek_to_datetime in a dataframe to generate a new column, it seems the function is not working in a with_columns context.

Reproducible example (adapted from docs and generated by exporting a jupyter notebook to markdown):

import polars as pl
from iso_week_date.polars_utils import SeriesIsoWeek  # noqa: F401

s = pl.Series(["2022-W52", "2023-W01", "2023-W02"])
s.iwd.isoweek_to_datetime()

shape: (3,)
date
2022-12-26
2023-01-02
2023-01-09
df = pl.DataFrame(data=dict(isoweek=s))
df

shape: (3, 1)
isoweek
str
"2022-W52"
"2023-W01"
"2023-W02"
df.get_column("isoweek").iwd.isoweek_to_datetime().alias("week_date")

shape: (3,)
week_date
date
2022-12-26
2023-01-02
2023-01-09
df.with_columns(pl.col("isoweek").iwd.isoweek_to_datetime().alias("week_date"))
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
 in <module>:1                                                                                    
                                                                                                  
 1 df.with_columns(pl.col("isoweek").iwd.isoweek_to_datetime().alias("week_date"))              
   2                                                                                              
                                                                                                  
 /Users/pietropeterlongo/projects/poste-logistics/.venv/lib/python3.12/site-packages/iso_week_dat 
 e/polars_utils.py:453 in isoweek_to_datetime                                                     
                                                                                                  
   450 │   │   '''                                                                                
   451 │   │   ```                                                                                
   452 │   │   """                                                                                
 453 │   │   return isoweek_to_datetime(self._series, offset=offset, weekday=weekday)           
   454                                                                                        
   455 def isoweekdate_to_datetime(self: Self, offset: OffsetType = timedelta(0)) -> T:       
   456 │   │   """Converts `str` series or expr of ISO Week date format YYYY-WNN-D to a series    
                                                                                                  
 /Users/pietropeterlongo/projects/poste-logistics/.venv/lib/python3.12/site-packages/iso_week_dat 
 e/polars_utils.py:186 in isoweek_to_datetime                                                     
                                                                                                  
   183 '''                                                                                    
   184 ```                                                                                    
   185 """                                                                                    
 186 if not is_isoweek_series(series):                                                      
   187 │   │   msg = f"`series` values must match ISO Week format {ISOWEEK__FORMAT}"              
   188 │   │   raise ValueError(msg)                                                              
   189                                                                                            
                                                                                                  
 /Users/pietropeterlongo/projects/poste-logistics/.venv/lib/python3.12/site-packages/polars/expr/ 
 expr.py:152 in __bool__                                                                          
                                                                                                  
     149 │   │   │   "- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`\n"    
     150 │   │   │   "- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col 
     151 │   │   )                                                                                
   152 │   │   raise TypeError(msg)                                                             
     153                                                                                      
     154 def __abs__(self) -> Expr:                                                           
     155 │   │   return self.abs()                                                                
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`

workaround:

df["week_date"] = df.get_column("isoweek").iwd.isoweek_to_datetime().alias("week_date")
df
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
 in <module>:1                                                                                    
                                                                                                  
 1 df["week_date"] = df.get_column("isoweek").iwd.isoweek_to_datetime().alias("week_date")      
   2 df                                                                                           
   3                                                                                              
                                                                                                  
 /Users/pietropeterlongo/projects/poste-logistics/.venv/lib/python3.12/site-packages/polars/dataf 
 rame/frame.py:1211 in __setitem__                                                                
                                                                                                  
    1208 │   │   │   │   "DataFrame object does not support `Series` assignment by index"         
    1209 │   │   │   │   "\n\nUse `DataFrame.with_columns`."                                      
    1210 │   │   │   )                                                                            
  1211 │   │   │   raise TypeError(msg)                                                         
    1212 │   │                                                                                    
    1213 │   │   # df[["C", "D"]]                                                                 
    1214 │   │   elif isinstance(key, list):                                                      
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: DataFrame object does not support `Series` assignment by index

Use `DataFrame.with_columns`.

(note that the workaround does not work)

pietroppeter commented 2 months ago

ah I found the workaround that works: df.with_columns(df.get_column("isoweek").iwd.isoweek_to_datetime().alias("week_date"))

FBruzzesi commented 2 months ago

Ciao Pietro!

That's certainly a bug. Everything that works with series' should work with expr's as well.

I wonder if the extensions registration ends in some kind of conflict:

@pl.api.register_series_namespace("iwd")
@pl.api.register_expr_namespace("iwd")
class SeriesIsoWeek(Generic[T]):
    ...

I will investigate this further.

Could you please share the polars version you are using?

FBruzzesi commented 2 months ago

Ignore all previous considerations. The error

TypeError: the truth value of an Expr is ambiguous

is triggered in the line:

return series.str.extract(pattern).is_not_null().all()

which for Expr seems to cause issues. I will think how to deal with this (most likely with a duck-typing approach)

FBruzzesi commented 1 month ago

@pietroppeter should be fixed in v1.4.0

pietroppeter commented 1 month ago

Will check and report back