ErikNixdorf / sbat

The Repo for the Surface Water Balance Analysis Tool
BSD 3-Clause "New" or "Revised" License
3 stars 0 forks source link

Draft: Class refactoring #46

Closed MarcoHannemann closed 1 year ago

MarcoHannemann commented 1 year ago

This is a draft PR for discussing #41

Please do not merge until it is finished and everyone has reviewed.

Here an overview of my ideas and the most important changes I did so far:

Model.__init__(self)

1) The __init__ method needed an overhaul. All class attributes should be created in here, it is bad practice to introduce new class attributes in other class methods. This is why the attributes are now initialized as None.

2) To improve the readability I changed the structure of the __init__ method moving some parts into other methods that get called. Now there are two new private class methods _read_data(self), which reads the input data and does the required preprocessing; and _build_wd(self), which sets up the paths from the configuration file and creates new directories.

3) Connected to 2), I introduced a new attribute called Model.paths, this is a dictionary that stores all the paths. This way we have a central attribute which enhanced readability a lot.

The configuration file

Related to #5, we face the problem of testing different configuration in a testing framework. At the moment, the configuration file is read within the __init__ method. When creating a new instance of Model in a test case, we therefore always end up with the configuration of sbat.yml. This could be overwritten or decorated, but I think it makes more sense to read the configuration file externally and pass it as a dictionary to Model.

What is the advantage of this approach? We can create an artificial configuration dictionary from within the module, without having to read a file. This is how I solved it:

1) The configuration file is now read by a static method Model.read_config(). You can imagine a static method as a function, that could easily be defined outside of the class. We just put it into the class because it "logically belongs there". It does not interact with the class itself.

2) This method is called in main(), before the Model instance itself is created.

Writing output

With commit 98d2361, writing the output of the sbat model is no longer happening in main(), but in Model.get_waterbalance() instead. I think it makes sense to move it there to streamline the main() function a bit; also it is the only method that creates writable output. Whether to write output or not, is stored as a bool in self.output.

Logging

I moved the logger definition to top of the file to get rid of the global stuff. ~~A log will be written if we run sbat.py from an IDE or from the terminal. There will be no output written, if sbat is imported in another script. In my opinion, this behavior is desirable, as I would not want a third-party module to write logs when integrating them into my model pipeline. What do you think about this?~~

Unused functions

I removed two functions that are not used, one is iterdict() and the other one was a function in waterbalance.py, see fe66bfb. We can add them later again if needed.

What's next?

MarcoHannemann commented 1 year ago

I merged the current version of develop into this PR, if you have time you could briefly check if I did no mistake when resolving the conflicts @ErikNixdorf

ErikNixdorf commented 1 year ago

@MarcoHannemann line 25 in sbat.py the output dir of the logger is partly hardcoded "fh = logging.FileHandler(f'{Path(__file__).parents[1]}/output/sbat.log', mode='w') I want to have the sbat.log file in the same subdirectory as all other output data, which is `../output/{project_name}/sbat.log Do you have an idea how tofix this? Aside of this, pull request is ready to merge

MarcoHannemann commented 1 year ago

I'll test it later and then we can merge.

I'm aware of the hard-coded path and thought about it... Without packaging it as a module, it is hard to retrieve the root directory. At the moment I don't know a good solution, but you can open another issue of low priority for this. I'm not really sure about the output/{project_name}/ path though- where is it mentioned? I thought all output would be written to sbat/output/.

Another thought: Wouldn't it make sense to write the log at the location where the script is run? Let's assume someone creates an own project with individual input date and runs the module. It would be a bit counter-intuitive the log get's saved to the module directory. This only makes sense as long as you use the module by cloning the repo directly from github, but not if you install it. But we can move that discussion to later when we talk about packaging...