Methodology, Code and Documentation Analysis

zepedropaulos commented 2 weeks ago

Hi,

We have several questions on the code structuring and function functionality inconsistencies:

Pattern-based methods

General/Noise Generation

1.

In some environment examples, the coordinates of generators (in absolute values) exceed the limits of the mesh, which is Lx=Ly=1000. If we are considering a 1000x1000 mesh, how does considering roughly [-1000,1000] coordinates (~2000 in size) work with this pre-defined mesh size? What function deals with this issue?

2.

Some of the coordinates of all the generators in the provided environment are negative. After reviewing the code, no treatment for negative coordinates was found.

An example of that lack of treatment is the noise generated in the mesh, only considering positive coordinates within the interval x=[0,Lx] and y=[0,Ly].

The points where independent noise will be created are defined (35-50):

# Compute number of element in each dimension
Nx_comp = int(Lx // dx_corr + 1) + add_dim
Ny_comp = int(Ly // dy_corr + 1) + add_dim
Nt_comp = int(T // dt_corr + 1) + add_dim
return Nx_comp, Ny_comp, Nt_comp

And generate noise in those coordinates: (generation_utils.py, Line 53-74):

# Generate gaussian noise input·
#output = np.random.normal(0, 1, (Nx_comp, Ny_comp, Nt_comp))
output = prng.normal(0, 1, (Nx_comp, Ny_comp, Nt_comp))

3.

What exactly is the purpose of the add_dim(add dimension) function?(generate_solar_wind.py, Line 23)

def get_add_dim(params, prods_charac):
    scale_solar_coord_for_correlation = float(params["scale_solar_coord_for_correlation"]) if "scale_solar_coord_for_correlation" in params else None
    add_dim = 0
    dx_corr = int(params['dx_corr'])
    dy_corr = int(params['dy_corr'])
    for x, y, type_gen  in zip(prods_charac["x"], prods_charac["y"], prods_charac["type"]):
        if type_gen == "solar" and scale_solar_coord_for_correlation is not None:
            x = scale_solar_coord_for_correlation * x
            y = scale_solar_coord_for_correlation * y
        x_plus = int(x // dx_corr + 1)
        y_plus = int(y // dy_corr + 1)
        add_dim = max(y_plus, add_dim)
        add_dim = max(x_plus, add_dim)
    return add_dim

Does it only help with the spatial noise interpolation for generators at the mesh's limits? (generation_utils.py, Line 100-103)

# Compute number of element in each dimension
Nx_comp = int(Lx // dx_corr + 1) + add_dim
Ny_comp = int(Ly // dy_corr + 1) + add_dim
Nt_comp = int(T // dt_corr + 1) + add_dim

Instead of predefined mesh dimensions, would it be better to generate mesh dimensions based on the farthest coordinate to enable full spatial interpolation calculations from the start?

4.

The documentation states that the neighbour selection is only based on the one-nearest, but after some code analysis, it seems four-nearest is used. What should we assume? (generation_utils.py, Line 111-135)

# Get coordinates
x, y = locations

# Get coordinates of closest points in the coarse mesh
x_minus = int(x // dx_corr)
x_plus = int(x // dx_corr + 1)
y_minus = int(y // dy_corr)
y_plus = int(y // dy_corr + 1)

(...)

# For every close point, add the corresponding time series, weighted by the inverse
# of the distance between them
for x_neighbor in [x_minus, x_plus]:
    for y_neighbor in [y_minus, y_plus]:
        dist = 1 / (np.sqrt((x - dx_corr * x_neighbor) ** 2 + (y - dy_corr * y_neighbor) ** 2) + 1)
        output += dist * computation_noise[x_neighbor, y_neighbor, :]
        dist_tot += dist
output /= dist_tot

5.

What is the role of scale_solar_coord_for_correlation, which increases add_dimby scaling solar coordinates? (generate_solar_wind.py, Line 23 -36)

def get_add_dim(params, prods_charac):
    scale_solar_coord_for_correlation = float(params["scale_solar_coord_for_correlation"]) if "scale_solar_coord_for_correlation" in params else None

    (...)

    for x, y, type_gen  in zip(prods_charac["x"], prods_charac["y"], prods_charac["type"]):
        if type_gen == "solar" and scale_solar_coord_for_correlation is not None:
            x = scale_solar_coord_for_correlation * x
            y = scale_solar_coord_for_correlation * y

Solar Generation

6.

Regarding the bias value of the spatially and temporally correlated noise, considering it as an average efficiency value of solar generators, what is this value based on? If this is indeed its purpose, wouldn’t it be more accurate to calculate the average efficiency of solar generators to assign a precise value to the bias value? In this situation, the efficiency values and the respective weight of each generator in the total solar production would be provided as input. (solar_wind_utils.py, Line 89-93)

if "mean_solar_pattern" in params:
    mean_solar_pattern = float(params["mean_solar_pattern"])
else:
    # legacy behaviour
    mean_solar_pattern = 0.75

7.

It is used a solar pattern file dated 2007. Shouldn't this file be updated to a more recent date?

8.

The Additional Independent Noise isn't used as documented. The relevant code is commented, noting that added noise was insignificant (smooth_dist/PMax is small). (solar_wind_utils.py, Line 95-98)

signal = solar_pattern * (mean_solar_pattern + std_solar_noise * final_noise)
#signal += prng.uniform(0, smoothdist/Pmax, signal.shape) #to be revised: since smmothdist/PMax is very small, the added noise compared to the previous sinal was unsignificant
#signal += np.random.uniform(0, smoothdist / Pmax, signal.shape) #older version - to be removed
# signal[signal > 1] = 1

Wind Generation

9.

When calculating the seasonal pattern for wind generation, the following equation is used (solar_wind_utils, Line 49):

seasonal_pattern = np.cos((2 * np.pi / (365 * 24 * 60)) * (t - 30 * 24 * 60 - start_min))

The variable start_min, which represents the offset from the start of the simulation, should not be added instead of subtracted? This is the approach taken in calculating the seasonal pattern for load generation. (consumption_utils, Line 67)

seasonal_pattern = 1.5 / 7. * np.cos(year_pattern * (t + start_min - 45 * nb_sec_per_day))

10.

A clearer explanation of the wind production formula is needed, especially regarding the constants (0.1, 4, and 0.3) and why only medium/long wind noise includes the 0.3 sum.

11.

Short wind is not summed by the value of 0.3 nor multiplied by the wind pattern like the other types of noise, why?

12.

An explanation is required for the division of the wind pattern into oscillating (30%) and constant (70%) parts.

Load Generation

13.

Regarding the comment written: It specifies that if the "use_legacy" method it set to "TRUE," it will only function correctly when simulating “2050”. What does it means? (consumption_utils.py, line 21).

def compute_loads(loads_charac, temperature_noise, params, load_weekly_pattern,
                  start_day, add_dim, day_lag=0,
                  return_ref_curve=False,
                  use_legacy=True):
    #6  # this is only TRUE if you simulate 2050 !!! formula does not really work

14.

When using the legacy method, the day_lag variable is intended to cancel the offset between the load weekly pattern and the simulation within the day of the week. However, the lines that compute day_lag are commented out, causing a default value of 0 to be used instead. Why is the day_lag calculation not being used?

def compute_loads(loads_charac, temperature_noise, params, load_weekly_pattern,
                  start_day, add_dim, day_lag=0,
                  return_ref_curve=False,
                  use_legacy=True):

(...)

    # start_day_of_week = start_day.weekday()
    # first_dow_chronics = datetime.strptime(load_weekly_pattern["datetime"].iloc[1], "%Y-%m-%d %H:%M:%S").weekday()
    # + (calendar.isleap(start_day.year) if start_day.month >= 3 else 0)
    # day_lag = (first_dow_chronics - start_day_of_week) % 7
    # day_lag = 0

15.

In the "new usage" method (use_legacy = FALSE), the output isn't interpolated to achieve the target temporal resolution, dt (as it is in the "legacy usage"). (consumption_utils.py, Lines 167-174)

    else:
        # new usage
        nb_ts = int((params['end_date'] - params['start_date']).total_seconds() / 60 / params["dt"] + 1)  # +1 is because of the buggy stuff above...
        N_repet = np.ceil(nb_ts / weekly_pattern.shape[0]).astype(int)
        stacked_weekly_pattern = np.tile(weekly_pattern, N_repet)
        output = stacked_weekly_pattern[:nb_ts]

return output

Economic Dispatch

16.

A more detailed explanation is needed regarding the reason for using reactive compensation. (utils.py - Reformat_load, Line 159)

snapshots, step_opf, q_compensation = params['snapshots'], params['step_opf_min'], params['reactive_comp']
# Set temp index to load
load.index = snapshots
# Resample data and scale to compensate reactive part
load_resampled = load.resample(f'{str(step_opf)}min').apply(lambda x: x[0])
load_resampled *= q_compensation

17.

Clarification is needed in this ramp data processing. (utils.py - preprocess_net, Line 254/255) Note: There is no option to provide a value for input_data_resolution, as it always defaults to 5 minutes. (run_economic_dispatch.py, Line 70)

steps = every_min / input_data_resolution
net.generators.loc[:, ['ramp_limit_up', 'ramp_limit_down']] *= steps

18.

Is it possible to provide snapshot data as user input? If not, it will always be used a default value for frequency, 5 min. (utils.py, Lines 103 - 123)

# Initialize possible param keys if they are
# not passed by user
params={'snapshots': [],
        'step_opf_min': 5,
        'mode_opf': 'day',
        'reactive_comp': 1.025,
}
params.update(params_user)
# Get user params
snaps = params['snapshots']

(...)

if snaps == []:
        snapshots = pd.date_range(start=start_date, periods=num, freq='5min')
        params.update({'snapshots': snapshots})

19.

Should snapshots have a frequency related to the real-time data frequency, df? The snapshot structure (range) is applied to the load index, therefore, both need to be set to the same frequency. (utils.py, Line 179)

load.index = snapshots

20.

Only one load was added during the initialization of PypsaDispatcher, a single load named "agg_load" with no associated value (Line 47). Since then, nothing related to Load in the network has been changed, as the load values were always modified in a separate variable. Therefore, what is the reason for removing the load added to the network if immediately afterward an identical one is added? (utils.py, Line 250)

class PypsaDispatcher(Dispatcher, pypsa.Network):
    """
    Inheriting from Dispatcher to implement abstract methods thanks to Pypsa API
    Wrapper around a pypsa.Network to add higher level methods
    """

    # PATCH
    # to avoid problems for respecting pmax and ramps when rounding production values in chronics at the end, we apply a correcting factor
    PmaxCorrectingFactor = 1
    RampCorrectingFactor = 0.1

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.add('Bus', 'node')
        self.add('Load', name='agg_load', bus='node')

# Remove all loads modelled in PyPSA
# and create one single agg_load.
net.mremove('Load', names=net.loads.index)  
net.add('Load', name='agg_load', bus=net.buses.index.tolist()[0])

21.

No reference to slack_pmin and slack_pmax was found as possible input parameters. (run_economic_dispatch.py, Line 76/82)

if 'slack_name' in params:
    if 'slack_pmin' in params:
        # user specified a pmin for the slack bus
        slack_name = str(params["slack_name"])
        slack_pmin = float(params["slack_pmin"]) / float(pypsa_net.generators.loc[slack_name].p_nom)

    if 'slack_pmax' in params:
        # user specified a pmin for the slack bus
        slack_name = str(params["slack_name"])
        slack_pmax = float(params["slack_pmax"]) / float(pypsa_net.generators.loc[slack_name].p_nom)

BDonnot commented 2 weeks ago

Hello,

Thanks for all these issues.

Unfortunately I am not certain there are some person who can answer all these questions. This code has been used occasionnally to generate dataand fixes on fixes on fixes has been made whenever error were encountered.

In other words, the technical debt is huge for this project. We are more than willing to accept any PR that tries to clean it up as it is not the priority at the moment (due to lack of available time)

I'll try to give some answers when I have some, but they will be disappointing I am afraid.

Mesh should be an abstract "things" with any range you want. Somewhere in the code the min / max is computed and things are done based on that. I cannot tell you were exactly but I think (as far as I remember) that is done somewhere (not my extra low confidence)
same answer as above
I don't remember but probably something for some environment will break if you remove the "add_dim" function. Feel free to make a PR for a better solution. I'm pretty sure your solution would work and be more consistent and generalizes much better
Documentation is the result of an attempt to describe what is done. It has been done partially in 2020 and 2021 and never updated since. From the beginning the doc was full of mistakes and incomplete. This is why no huge attempt have been done to update it. So here again, if you spot some clearly wrong statements in the doc, feel free to do a PR to fix them :-)
I don't remember
There are lots of way to generate solar data. The best would be to have a clean API (and a code example on how to implement it) and to let people implement their own generation process. I totally agree with you that what is done in solar, wind, loads and for the "market" is definitely not "state of the art". Refactoring the code this way would be way better but we don't have the manpower to do it.
See point 6 above :-) (and i'm convinced that yes it need !)
see comment on 4, same issue here
I don't really know, maybe you are correct. A bug probably has been spotted on either load and wind and only the thing where the bug has been spotted has been fixed. Some archaeology would be needed here to find first which is the last modification (on wind or on load) and then to apply this last modification on the other one (on load or on wind). A PR would be very much appreciated :-)
Totally agree with you :-) Which again bring us down to point 6 above. With a clean API (and this rather weird example) people would be able to implement the wind generation they want easily.
Good question :-/
Indeed it does, but I suspect no one knows / remembers why it's done this way. Which again brings us down to 6 and 10 above
It probably means that if you are simulating data for a different year than 2050 (with a given method) then loads will most likely be realistic, probably with the day of the week (eg you might have loads on Sunday higher than Tuesday for example) or maybe it says something else.
This "if legacy: else:" was added because some super basic tests are performed. And as no one really remembers what these tests do nor why they do it then each time a new feature or a bugfix is added, I add this "if / else" not to break the tests I did not do. Is it terrible programming practice ? Most definitely. Does it contribute to the enormous technical debt, most definitely yes...
Probably, or maybe it's done elsewhere later in the code that does not use the "legacy" code. Honestly I don't know
Yes, but no one remembers I assume
Totally agree also here
I don't know / remember (see point 6 above)
I don't know / remember (see point 6 above)
Good question
See comment about docs above

BDonnot commented 2 weeks ago

To summarize, the best way, if you want to use chronix2grid at the moment is not to follow any process in the readme, in the notebooks or anywhere else.

The most "up to date" way to use it is to create a grid2op environment without time series but with the correct files, and then to invoke:

env = grid2op.make(A/PATH/TO/AN/ENV/NEEDING/TIME_SERIES)
env.generate_data(nb_year=1)

Then, once done, you inspect the generated time series. If they look good, you generate lots of data with, for example

env.generate_data(nb_year=32)

If not, then you change some config files of chronix2grid. Regenerate the data and inspect them again.

This is highly manual process but this is how the last generated data have been released since 2022.

This allows to get started with data "fine tuned to your problem" as a first "baseline". Usually, what has been done in l2rpn competitions is to add some (undocumented, untested, within a "if not legacy" block, etc.) code to "improve" data generation for example.

marota commented 2 weeks ago

Hi José, About your questions related to the meaning of some config parameters (16 and 21), you can find the answers here: https://chronix2grid.readthedocs.io/en/latest/DESCRIPTION.html#economic-dispatch-generation-hydro-nuclear-and-thermic-generators

marota commented 2 weeks ago

For question 20, I think that the idea was to reduce the network to a sigle node as the network representation is not needed for solving this dispatch problem. By doing so, you also reduce the loads to a single aggregated load. The problem solved here "only" tries to balance production with loads, but over a scenario (not independant snapshots) by considering the ramps as constraints.

marota commented 2 weeks ago

For the questions related to the wind model generation model, the best answers we have are in this document from 2020 I shared with you

Grid2op / chronix2grid