Closed MartinBernstorff closed 6 months ago
I'm using the newest version, but the BooleanOutcomeSpec shows some buggy behaviour. I'm a bit busy tomorrow, but I'll test more thoroughly and provide an issue if the problem persists. Hopefully I'll get around to it at the end of tomorrow. Super grateful for getting such a fast response for the problem - thanks!
Hey again, I finally got around to look at this bug and diagnose the problem. The problem persists on our server at Rigshospitalet, but I managed to fix it locally and also reproduce the problem.
For both PredictorSpec and for OutcomeSpec, the horizontal concatenation throws and error saying that "Series has no attribute drop". Most likely this is because it iterates over the individual dataframe, which then makes the object "df" a series and not a dataframe.
This problem occurs with:
But is fixed with:
import` datetime as dt
import numpy as np
import polars as pl
import pandas as pd
# Load a dataframe with times you wish to make a prediction
prediction_times_df = pl.DataFrame(
{
"id": [1, 1, 2],
"date": pd.to_datetime(["2020-01-01", "2020-02-01", "2020-02-01"]),
}
)
# Load a dataframe with raw values you wish to aggregate as predictors
predictor_df = pl.DataFrame(
{
"id": [1, 1, 1, 2],
"date": pd.to_datetime(
["2020-01-15", "2019-12-10", "2019-12-15", "2020-01-02"]
),
"value": [1, 2, 3, 4],
}
)
# Load a dataframe specifying when the outcome occurs
outcome_df = pl.DataFrame(
{"id": [1], "date": pd.to_datetime(["2020-03-01"]), "value_outcome": [1]}
)
# Specify how to aggregate the predictors and define the outcome
from timeseriesflattener import (
MaxAggregator,
MinAggregator,
OutcomeSpec,
PredictionTimeFrame,
PredictorSpec,
ValueFrame,
)
predictor_spec = PredictorSpec(
value_frame=ValueFrame(
init_df=predictor_df.lazy(),
entity_id_col_name="id",
value_timestamp_col_name="date",
),
lookbehind_distances=[dt.timedelta(days=1)],
aggregators=[MaxAggregator()],
fallback=np.nan,
column_prefix="pred",
)
outcome_spec = OutcomeSpec(
value_frame=ValueFrame(
init_df=outcome_df.lazy(),
entity_id_col_name="id",
value_timestamp_col_name="date",
),
lookahead_distances=[dt.timedelta(days=1)],
aggregators=[MaxAggregator()],
fallback=np.nan,
column_prefix="outc",
)
# Instantiate TimeseriesFlattener and add the specifications
from timeseriesflattener import Flattener
result = Flattener(
predictiontime_frame=PredictionTimeFrame(
init_df=prediction_times_df.lazy(),
entity_id_col_name="id",
timestamp_col_name="date",
)
).aggregate_timeseries(specs=[predictor_spec, outcome_spec])
result.collect()
I'll try to update everything on the server and that should fix everything. Just as a note; I get an error when running the example on the frontpage on Github due to conflicting specs from outcome and predictor specs (both have a column called "value", which then makes them conflicting). I'll implement this on my end and see if that fixes the problem on the server.
All the very, very best, Mikkel
Ah yeah, sorry to hear it! We actually fixed this problem locally as well; you'll find that the iterpy
dependency has been pinned on main to avoid it. I'm pretty sure just changing iterpy
should fix it.
Excellent point re: the example, we'll take a look! Let me know if this is fixed.
It freaking works with iterpy being downgraded, hallelujah! Thank you so much!
All the best, Mikkel
Excellent! Closing.
Ask Mikkel Werling for details. mikkel.werling@regionh.dk