EnergySystemsModellingLab / MUSE_2.0

Welcome to the MUSE 2.0 repository
GNU General Public License v3.0
1 stars 0 forks source link

Add more validation for time slices #146

Closed alexdewar closed 3 months ago

alexdewar commented 3 months ago

Description

This PR introduces better typing for time slices and adds extra validation steps, both for when reading time slice input data and processes data (which is currently the only place where time slices are used).

I've changed things so that the possible seasons and times of day are now listed in model.toml, rather than just being implicit from the values that appear in time_slices.csv. The main reason for doing this is so that the ordering is explicit and obvious (we need to know which time slices occurs after which!). Not every combination of season and time of day needs to be present in time_slices.csv and it is possible to have input data where it is literally impossible to know what the user's intended ordering is. So now, the ordering is specified explicitly in model.toml and it doesn't matter what order the rows are in time_slices.csv as the data will be sorted after loading.

I've introduced a new type TimeSliceID, which represents the name of the season and time of day for a given time slice, which is intended to be used in all the places elsewhere in the code where we refer to time slices (e.g. for processes). The existing TimeSlice struct contains a TimeSliceID and also the fraction of the year for that time slice.

I also noticed that everywhere where we are reading in proportions from input values (using the deserialise_proportion function), it doesn't actually make sense to accept zeroes (e.g. a time slice with a fraction of 0.0 doesn't mean anything), so I changed this function to disallow zeroes.

Closes #109.

(I've tagged you as a reviewer for this @AdrianDAlessandro, but no pressure. Let me know if you've had your fill of Rust!)

Type of change

Key checklist

Further checks

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 80.82192% with 14 lines in your changes missing coverage. Please review.

Project coverage is 80.07%. Comparing base (156c2b9) to head (e012959). Report is 4 commits behind head on main.

Files Patch % Lines
src/time_slice.rs 70.00% 12 Missing :warning:
src/model.rs 91.66% 1 Missing :warning:
src/process.rs 94.44% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #146 +/- ## ========================================== + Coverage 78.24% 80.07% +1.83% ========================================== Files 10 10 Lines 216 271 +55 ========================================== + Hits 169 217 +48 - Misses 47 54 +7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ahawkes commented 3 months ago

The order of the time slices does not matter.

Knowing what ones split up days (i.e. diurnal) within a season, and which ones split up seasons, does matter. This is because sometimes commodity balances are seasonal and we don’t care too much what’s happening within a day – this is the case where the nature of the commodity confers some ability to store it (e.g. coal storage at a power station – they have a huge pile of coal outside)

From: Diego Alonso Álvarez @.> Sent: Tuesday, August 13, 2024 10:34 AM To: EnergySystemsModellingLab/MUSE_2.0 @.> Cc: Subscribed @.***> Subject: Re: [EnergySystemsModellingLab/MUSE_2.0] Add more validation for time slices (PR #146)

This email from @.**@.> originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

@dalonsoa requested changes on this pull request.

As far as I can tell, this looks Ok, but I miss a lot of docstrings explaining what's going on.

There are a couple of things I don't fully understand. One is Rc, which seems to stand for "Reference Counted", but that I don't understand what actually does even after reading the docshttps://doc.rust-lang.org/std/rc/struct.Rc.html. Could you explain?

The second thing is why the order of the timeslices matter? It does not in MUSEv1, as far as I can tell.


In src/time_slice.rshttps://github.com/EnergySystemsModellingLab/MUSE_2.0/pull/146#discussion_r1714969301:

+#[derive(PartialEq, Debug)]

+pub struct TimeSliceID {

+}

+

+#[derive(PartialEq, Debug)]

+pub enum TimeSliceSelection {

+}

+

+/// Ordered list of seasons and times of day for time slices

+#[derive(Debug, Deserialize, PartialEq)]

+pub struct TimeSliceDefinitions {

+}

+

+impl TimeSliceDefinitions {

In all of these - and in others in this PR - I'm missing some sort of docstring or similar explaining what they are for, the inputs, etc., so they are properly documented when building the documentation.

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_2.0/pull/146#pullrequestreview-2235014551, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLI4ZHSPZYQISET6LSDZRHHHJAVCNFSM6AAAAABMN2R3KWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDEMZVGAYTINJVGE. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

alexdewar commented 3 months ago

Ah ok, it seems like I've misunderstood! I'll have another go.

alexdewar commented 3 months ago

There are a couple of things I don't fully understand. One is Rc, which seems to stand for "Reference Counted", but that I don't understand what actually does even after reading the docs. Could you explain?

So in general with Rust (as with C++), objects are destroyed as soon as they go out of scope (e.g. the function they're in returns). If you want to keep them around after this you can either copy them (Rust calls this "cloning") and pass them somewhere else or move them (which is just an optimisation so that you don't have to copy an object into another function when you're about to delete the original). Using Rc means you get a kind of garbage collection: if you clone an Rc<Something> it just increases a reference count, without actually copying the underlying data (i.e. it's a shallow copy) and every time it goes out of scope, the reference count is decreased again. When the reference count reaches 0 it's deleted (immediately, unlike Python).

The second thing is why the order of the timeslices matter? It does not in MUSEv1, as far as I can tell.

It turns out it doesn't :upside_down_face:. I'll fix this up.

tsmbland commented 3 months ago

I also noticed that everywhere where we are reading in proportions from input values (using the deserialise_proportion function), it doesn't actually make sense to accept zeroes (e.g. a time slice with a fraction of 0.0 doesn't mean anything), so I changed this function to disallow zeroes.

I get what you're saying, although I still think it might be useful to allow zeros. Maybe someone wants to see what would happen if they deleted a timeslice, and it would be much easier just to set the timeslice proportion to zero. Also, if you're planning to re-use this function, there are other places in the inputs where a zero proportion might make sense (e.g. demand allocation to timeslices).

dalonsoa commented 3 months ago

The main issue when allowing zeros is that you often have to divide by those values and if they are zero, the result makes no sense - and often result in an error. I rather avoid having that issue to start with than having to check all over the place if a value is zero before dividing by it.

tsmbland commented 3 months ago

Yeah that makes sense. We have this problem a lot in MUSE1 where things break with zeros, so people just end up using 0.00001 instead

alexdewar commented 3 months ago

Note that we can still accept zeroes as inputs elsewhere if we want (e.g. @tsmbland's example with demand allocation for time slices), but I just changed it for the existing cases and renamed the function to make it clearer that it rejects zeroes. We can always make another helper function called deserialise_proportion_allow_zero or whatever (I think it's good to be explicit!).

In the general case though, I think we should reject zeroes. Zero-length time slices are a weird kind of input that may still work with the model (notwithstanding @dalonsoa's point about divide-by-zero errors), but are likely to be the result of user error. For those sorts of cases, we should probably prominently display a warning to users if we don't reject it, but it's probably easier just to say "no" upfront.

alexdewar commented 3 months ago

Superseded by #148.