Quality Benchmark - Githubissues

sofia-001 commented 1 year ago

Bestimmte Auswertungen sollen auf Rohdaten genau so gut funktionieren, wie auf den anonymisierten Daten. Das Problem wird dabei an folgendem Beispiel verdeutlicht: Anhand der Rohdaten wird eine Prognose erstellt, wie der Stromverbrauch in einem Bürogebäude ist. Wenn dabei bei den anonymisierten Daten die Größe des Gebäudes "weganonymisiert" wird, dann kann man solche Faktoren, wie in großen Gebäuden arbeiten die Angestellten insgesamt später nicht mehr einbeziehen.

Ziel: Die Prognosen sollen auf den anonymisierten Daten beihnage genau so gut sein, wie auf den Rohdaten zuvor.

[x] Prognose
[x] Information loss

sofia-001 commented 1 year ago

Unser Input in Richtung Prognose:

Eine wirkliche Prognose empfiehlt sich im E-Mobilitäts-Case nicht, wir greifen deshalb auf eine Simulation zurück. Für die Simulation sind vorab Kenngrößen zur Justierung auf das Datenset notwendig. Dies umfasst:

Verteilung (dec_charge & noise)
Startzeiten / Plug-In-Zeiten (cfg.min_start, cfg.max_start)
Verweildauer / Plug-Off – Plug-In (cfg.min_duration, cfg.max_duration)
Ladebedarfe (cfg.min_demand, cfg.max_demand)

Die Kenngrößen müssen durch die Auswertung der Messwerte anhand statistischer Methoden berechnet werden. An sich wäre damit auch der reine Vergleich bezüglich Datenqualität für Anonymisierung vs. ohne auf Basis der Kenngrößen möglich.

Side Note: Die Simulation berechnet für Folgetage ausgehend von 0 Uhr. Wenn Ladevorgänge am aktuellen Tag starten, werden diese jedoch berücksichtigt und gehen ggf. über das Tagesende hinaus bis in den nächsten Tag hinein.

Grüße Therese

Hier der Kern der Simulation: def _simulate_ev_forecast(self, df: DataFrame, cfg: TaskSimEvCharging) -> DataFrame:

"""
The function simulates an EV forecast as well as the measured values.

The complete process does not require any intermediate storage. New events are
rolled on a daily basis. A constant seed is used by the day number in the
year. This keeps the simulation the same within a day.

:param df:
:param cfg:
:return: Dataframe with results
"""
# decission variables through specificly configured normal distributions
dec_charge = Normal(0.5, 0.1)
dec_start = Normal(
    cfg.min_start + (cfg.max_start - cfg.min_start) * 0.5,
    (cfg.max_start - cfg.min_start) * 0.15,
)
noise = Normal(0, 0.3)
dec_duration = Normal(
    cfg.min_duration + (cfg.max_duration - cfg.min_duration) * 0.5,
    (cfg.max_duration - cfg.min_duration) * 0.15,
)
dec_demand = Normal(
    cfg.min_demand + (cfg.max_demand - cfg.min_demand) * 0.5,
    (cfg.max_demand - cfg.min_demand) * 0.15,
)
# generate ev event for one charging points
plans = []
nr_days = (
    df.index.day.unique().size
)  # get number of day's indirect over input time series index
seed(
    Timestamp.utcnow().dayofyear
)  # initialization of seed by day number of year
# make decisions for each day
# takes charing place or not
charge_dec = dec_charge.sample(nr_days) <= 0.8
# how many minutes after midnight starts the charging process
start_dec = dec_start.sample(nr_days)  # for forecast
mstart_dec = noise.sample(nr_days)  # for measurement (-1d)
# how long is the charging process
duration_dec = dec_duration.sample(nr_days)  # for forecast
mduration_dec = noise.sample(nr_days)  # fore measurement
# how how much energy is needed
demand_dec = dec_demand.sample(nr_days)  # for forecast
mdemand_dec = noise.sample(nr_days)  # for measurement
# at what power level takes the charging place
power_dec = choices(cfg.power, k=nr_days)
# put decissions together in an event list
for i in range(nr_days):
    plans.append(
        EvChargingPlan(
            charge=charge_dec[i],
            start=int(start_dec[i]) if charge_dec[i] else -1,
            mstart=int(start_dec[i] + mstart_dec[i] * 30)
            if charge_dec[i]
            else -1,
            duration=int(duration_dec[i]) if charge_dec[i] else -1,
            mduration=int(duration_dec[i] + mduration_dec[i] * 60)
            if charge_dec[i]
            else -1,
            demand=demand_dec[i] if charge_dec[i] else -1,
            mdemand=demand_dec[i] + mdemand_dec[i] * 5 if charge_dec[i] else -1,
            power=power_dec[i] if charge_dec[i] else -1,
        )
    )
# build result time series
pipo = array([])
mpipo = array([])
demand = array([])
mdemand = array([])
power = array([])
mpower = array([])
od_pipo = zeros(12 * 60)
od_mpipo = zeros(12 * 60)
od_demand = zeros(12 * 60)
od_mdemand = zeros(12 * 60)
od_power = zeros(12 * 60)
od_mpower = zeros(12 * 60)
for item in plans:
    d_pipo = zeros(36 * 60)
    d_mpipo = zeros(36 * 60)
    d_demand = zeros(36 * 60)
    d_mdemand = zeros(36 * 60)
    d_power = zeros(36 * 60)
    d_mpower = zeros(36 * 60)
    if item.charge:
        d_pipo[item.start - 1 : item.start - 1 + item.duration] = 1
        d_mpipo[item.mstart - 1 : item.mstart - 1 + item.mduration] = 1
        d_demand[item.start - 1 : item.start - 1 + item.duration] = (
            item.demand / item.duration
        )
        d_mdemand[item.mstart - 1 : item.mstart - 1 + item.mduration] = (
            item.mdemand / item.mduration
        )
        # power shuts down, after demand is full filled (in minute steps)
        duration = ceil(item.demand / item.power * 60)
        if item.duration < duration:
            duration = item.duration
        d_power[item.start - 1 : item.start - 1 + duration] = item.power
        # measured power
        for k in range(item.mstart - 1, item.mstart - 1 + item.mduration):
            seed(Timestamp.utcnow().dayofyear)
            d_mpower[k] = item.power - abs(noise.sample(1)) * 0.1 * item.power
    # add history
    d_pipo[: 12 * 60] = d_pipo[: 12 * 60] + od_pipo
    d_mpipo[: 12 * 60] = d_mpipo[: 12 * 60] + od_mpipo
    d_demand[: 12 * 60] = d_demand[: 12 * 60] + od_demand
    d_mdemand[: 12 * 60] = d_mdemand[: 12 * 60] + od_mdemand
    d_power[: 12 * 60] = d_power[: 12 * 60] + od_power
    d_mpower[: 12 * 60] = d_mpower[: 12 * 60] + od_mpower
    # append to data array
    pipo = concatenate((pipo, d_pipo[: 24 * 60]), axis=0)
    mpipo = concatenate((mpipo, d_mpipo[: 24 * 60]), axis=0)
    demand = concatenate((demand, d_demand[: 24 * 60]), axis=0)
    mdemand = concatenate((mdemand, d_mdemand[: 24 * 60]), axis=0)
    power = concatenate((power, d_power[: 24 * 60]), axis=0)
    mpower = concatenate((mpower, d_mpower[: 24 * 60]), axis=0)
    # save over leap to next day
    od_pipo = d_pipo[-12 * 60 :]
    od_mpipo = d_mpipo[-12 * 60 :]
    od_demand = d_demand[-12 * 60 :]
    od_mdemand = d_mdemand[-12 * 60 :]
    od_power = d_power[-12 * 60 :]
    od_mpower = d_mpower[-12 * 60 :]
# time sampling
s1a = Series(
    pipo,
    name="pipo",
    index=date_range(
        start=self.root_now, periods=nr_days * 24 * 60, freq="1min"
    ),
)
s1b = Series(
    pipo,
    name="pipo",
    index=date_range(
        start=self.root_now - Timedelta("1d"),
        periods=nr_days * 24 * 60,
        freq="1min",
    ),
)
s2a = Series(
    demand,
    name="demand",
    index=date_range(
        start=self.root_now, periods=nr_days * 24 * 60, freq="1min"
    ),
)
s2b = Series(
    mdemand,
    name="demand",
    index=date_range(
        start=self.root_now - Timedelta("1d"),
        periods=nr_days * 24 * 60,
        freq="1min",
    ),
)
s3a = Series(
    power,
    name="power",
    index=date_range(
        start=self.root_now, periods=nr_days * 24 * 60, freq="1min"
    ),
)
s3b = Series(
    mpower,
    name="power",
    index=date_range(
        start=self.root_now - Timedelta("1d"),
        periods=nr_days * 24 * 60,
        freq="1min",
    ),
)
s1a = s1a.resample(self.sampling_time).max()
s1b = s1b.resample(self.sampling_time).max()
s2a = s2a.resample(self.sampling_time).sum()
s2b = s2b.resample(self.sampling_time).sum()
s3a = s3a.resample(self.sampling_time).mean()
s3b = s3b.resample(self.sampling_time).mean()
# build data frames
df_f = (
    s1a.to_frame()
    .merge(s2a.to_frame(), left_index=True, right_index=True)
    .merge(s3a.to_frame(), left_index=True, right_index=True)
)
df_m = (
    s1b.to_frame()
    .merge(s2b.to_frame(), left_index=True, right_index=True)
    .merge(s3b.to_frame(), left_index=True, right_index=True)
)
# export time series
ndf = DataFrame()
for column in df_f.columns:
    export = DataFrame(
        data=df_f[column].values, index=df_f.index, columns=[column]
    )
    if column != "pipo":
        self.df_list.append(export.copy())
    else:
        # return only pipo
        ndf = export
    self._export_data(
        df=export,
        ts_cfg=TaskStorage(
            id=cfg.export_ts[column].id, channel="f", bucket=self.cfg.ts.bucket
        ),
    )
    export = DataFrame(
        data=df_m[column].values, index=df_m.index, columns=[column]
    )
    export = export.loc[self.root_now - Timedelta("6h") : self.root_now]
    self.df_list.append(export.copy())
    self._export_data(
        df=export,
        ts_cfg=TaskStorage(
            id=cfg.export_ts[column].id, channel="m", bucket=self.cfg.ts.bucket
        ),
    )
# return only pipo
return ndf

fjoniyz commented 1 year ago

from pandas import DataFrame, Timestamp, Timedelta, date_range, concat, Series
from numpy import NaN, ones, cos, sin, linspace, pi, zeros
from numpy.random import seed
from random import choices
from chaospy import Normal
from pydantic import BaseModel 

class TaskSimEvCharging(BaseModel):

    """
    Configuration for generating EV events.
    """

    type: Literal["TaskSimEvCharging"] = "TaskSimEvCharging"

    """type definition of class for fail save parsing"""

    min_start: int = 300

    """[minute] beginning minute of day for possible start"""

    max_start: int = 1320

    """minute] end minute of day of possible start"""

    min_duration: int = 15

    """[minutes] minimal charging duration"""

    max_duration: int = 360

    """[minutes] maximal charging duration"""

    min_demand: int = 10

    """[kWh] minimal charging demand"""

    max_demand: int = 60

    """[kWh] maximal charging demand"""

    power: list[float] = [11.0, 22.0]

    """[kW] list of power level for random choice."""

    export_ts: dict[Literal["pipo", "demand", "power"], DepTsMath]

    """mapping of dependency time series for values PlugIn PlugOff 'pipo', 'demand' and

    'power'"""

This is the new input from Ampeers. Even with this, we cannot run the function as there are classes and implementations which are not known to us. For example, DepTsMath is a local module which we do not know how it looks like. Also the function from the first email has a self parameter from which we retrieve attributes and call functions defined in it. From that we can conclude that the function was taken from a class, but without the class itself(it does not need to be the whole class, only the attributes and the functions needed) we cannot do anything.

nomorehumor commented 1 year ago

Please if possible add a file to the message in the future :)

nomorehumor commented 1 year ago

[ ] We need relation of prognose to information loss

fjoniyz / ganges

Quality Benchmark #74