alexdewar commented 3 months ago

Description

We want to keep the program and model configuration separate, because:

It will the config files clearer for end users
It makes it easier to have a proper separation of concerns in the code
It makes it easier to share models (without having to remove user-specific preferences from your colleagues' files)

I've split up the old settings.toml into settings.toml (for program settings) and model.toml (for model parameters), which are processed by separate corresponding Rust modules (settings.rs and model.rs). While I was refactoring, I did some misc tidying, including adding a validity check for the milestone years (#113) and moving the one integration test we have to the a separate tests folder, which is apparantly best practice. In order to do the latter, I had to add a lib.rs file to src, which makes the crate both a library and a binary crate (there's no way to have external tests for a purely binary crate), following the recommendation in the Rust book: https://doc.rust-lang.org/stable/book/ch12-03-improving-error-handling-and-modularity.html#separation-of-concerns-for-binary-projects

@dalonsoa @tsmbland I know you're both busy with one thing or another, so no pressure to review this. I can work on other things in the meantime.

Closes #122. Closes #113.

Type of change

[ ] Bug fix (non-breaking change to fix an issue)
[x] New feature (non-breaking change to add functionality)
[ ] Refactoring (non-breaking, non-functional change to improve maintainability)
[ ] Optimization (non-breaking change to speed up the code)
[ ] Breaking change (whatever its nature)
[ ] Documentation (improve or add documentation)

Key checklist

[x] All tests pass: $ cargo test
[x] The documentation builds and looks OK: $ cargo doc

Further checks

[x] Code is commented, particularly in hard-to-understand areas
[x] Tests added that prove fix is effective or that feature works

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 68.18182% with 21 lines in your changes missing coverage. Please review.

Project coverage is 81.72%. Comparing base (c1edd2a) to head (4094aa3).

Files	Patch %	Lines
src/main.rs	0.00%	7 Missing :warning:
src/lib.rs	0.00%	6 Missing :warning:
src/model.rs	81.25%	6 Missing :warning:
src/log.rs	0.00%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #130 +/- ## ========================================== + Coverage 76.27% 81.72% +5.44% ========================================== Files 9 10 +1 Lines 177 186 +9 ========================================== + Hits 135 152 +17 + Misses 42 34 -8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alexdewar commented 3 months ago

I'm mostly confused, I would admit.

It's not just you. I think this is one way in which Rust packaging is kinda confusing. The documentation for this stuff wasn't great either :worried:

What sort of information would go in settings and what in model? At the moment, there are just 2 things - one per file - but I guess you have in mind a longer collection of stuff that could go in each and that should be kept separated, right?

Yep. Any model parameters would go into model.toml and anything user-specific that doesn't relate to the model definition (e.g. where you want to store output files etc.) would go into settings.toml. @ahawkes has said he'd like users to be able to configure where log files are saved in the settings too (#110).

Tbh I'm not actually sure program settings should be in the same folder as the model configuration at all (see #122), e.g. on Linux I would expect it to be somewhere like ~/.config/muse2/settings.toml. (Plus, it's not really something you'd want to change on a per-model basis.) That said, I think we don't want to confuse users (some of whom might not be v technical) by putting it somewhere obscure (to them), so it's probably best to put off that discussion for now.

On the library and binary crate... I really don't get it. What is the difference between both? When/where you should use one or the other? How this relates to using muse2 or create when using other modules? I understand I can search for this information, but as you already has, maybe you can provide a summary... 😄

Some of this information is not that easy to find :upside_down_face:.

A binary crate produces an executable and a library crate provides code that other crates can use (via my_crate_name::whatever). Other code can't access functions etc. inside binary crates -- I guess the rationale is that if you're publishing a binary crate to crates.io, you don't necessarily want other users to rely on being able to call random functions inside your crate from their code. In order to have integration tests in a separate folder outside your crate though (which is recommended), your crate can't be a purely binary crate, because the test code is outside your crate, so if everything's private then you won't be able to access any of the functions etc.

I guess for this reason (and possibly others), the Rust book recommends that you also make your binary crates into libraries with the bare minimum needed to start the program in main.rs (the main file for a binary crate) and the actual entry point to the program in lib.rs (the main file for a library crate).

In general, you access things from within a crate with use crate::my_submodule::something; etc. Weirdly though, when you have this binary + library structure, the crate::* namespace doesn't seem to exist in main.rs anymore (it's like it's outside the crate, somehow), hence the use of muse2::* in main.rs. This also means that you can only access public submodules in your crate from main.rs, which is why some of them are declared with pub mod rather than plain old mod. (NB: Also anything we want to be able to access from the integrations needs to be pub mod. Maybe they should just all be public?)

ahawkes commented 3 months ago

I realise this is important, but also aware that we're running out of time for your efforts in July. I think it's OK to separate model parameters and settings, but also think there's a fine line between the two. e.g. you might be running 5 models in parallel but 2 of them need a higher log level than the others - so you need seperate settings.toml for each one anyway.

Could we focus on getting all the input data files read in, and if there's still time a few lines of code (dispatch_optimisation.rs?) that sets up a cost vector, and a sparse constraint matrix. I mainly just want to see how to access the data that's been read in in a sensible way.

thanks!

From: Alex Dewar @.> Sent: 24 July 2024 10:56 To: EnergySystemsModellingLab/MUSE_2.0 @.> Cc: Hawkes, Adam D @.>; Mention @.> Subject: Re: [EnergySystemsModellingLab/MUSE_2.0] Separate model and program settings (PR #130)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

@alexdewar commented on this pull request.

In src/main.rshttps://github.com/EnergySystemsModellingLab/MUSE_2.0/pull/130#discussion_r1689489852:

use ::log::info; +use muse2::log;

The first one refers to the log crate and the other one is the submodule in muse2 with the same name.

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_2.0/pull/130#discussion_r1689489852, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC37JLKCKEWPXJN4AM7IXATZN526VAVCNFSM6AAAAABLKO3KBOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCOJWGE4DMMRTGE. You are receiving this because you were mentioned.Message ID: @.***>

dalonsoa commented 3 months ago

@ahawkes , even if this is not done, there's no time to have the input layer complete and the data in such a form that can be ingested by HIGHGS and spit a meaningful result. We went through this last Monday. A large chunk of it can be done, and it can be left in a good state to follow up by Tom, ourselves in a few months or yourself, but not ready to use.

So, being that the situation, it is much better to code things right and follow best principles than hacking something together in a poor way.

I'll let @alexdewar provide more details, but that expectation is just not realistic.

ahawkes commented 3 months ago

Thank you for the feedback @dalonsoa. Does just completing the input layer seem doable? Looks like it's commodity, agent and asset CSVs remaining. But all should follow similar approach to the complete process-related CSV reads?

Showing how to access the read-in data in the context of adding an element or two to a super-simple example sparse constraint matrix is an extra, and don't worry if those lines of code are too much, I can probably figure it out.

For clarity, I realise solving anything meaningful with HIGHS is way beyond what we can do right now, and I did not ask for this.

dalonsoa commented 3 months ago

For clarity, I realise solving anything meaningful with HIGHS is way beyond what we can do right now, and I did not ask for this.

My apologies. I misunderstood the purpose of the dispatch script you mentioned.

I think that completing the input layer might be possible. @alexdewar ?

alexdewar commented 3 months ago

Hi @ahawkes. I think it should be possible to have all the files being read in by the end of the month, with the caveat that the data might need more wrangling before it can be passed off to the optimiser. (For example, we're currently representing regions by their string IDs in various places, but ideally we want them to be some kind of reference type so that we don't have to look it up by ID every time. It's similar for the other types of ID as well as time slices.)

I'd consider this data wrangling step to be part of the input layer too.

alexdewar commented 3 months ago

@tsmbland As I think @dalonsoa is now away would you mind giving this a quick look over when you get the chance?

alexdewar commented 3 months ago

PS - @ahawkes I probably didn't make this obvious from the description, but most of the work of this PR is to tidy up the code rather than purely doing the model/program settings split. Doing things that way just means we can reorganise things more sensibly.

alexdewar commented 3 months ago

Cool. Thanks for the review!

EnergySystemsModellingLab / MUSE_2.0

Separate model and program settings #130

Description

Type of change

Key checklist

Further checks

Codecov Report