duncan-r / SHIP

Simple Hydraulic model Interface for Python
MIT License
7 stars 6 forks source link

Public API Changes #20

Open duncan-r opened 7 years ago

duncan-r commented 7 years ago

Continuing from #2

I think that some changes to the API for the tuflow package are almost inevitable here. The public side is not particularly well defined at the moment.

I think we need to probably get this completely nailed down, tag the dev branch pre-merge, but with the cross platform fixes in and then make the break.

Some thoughts on https://github.com/duncan-r/SHIP/issues/2#issuecomment-311686795 . If we replace the functionality in some places like ControlFile with ControlStructure I think it would be best to keep the existing naming convention. They match the tuflow naming conventions (TuflowModel, ControlFile, etc) and are therefore familiar to users.

I think importantly I'd like to maintain the flexibility of the tuflow package, but with the added bonus of making it a bit nicer to maintain and a bit more understandable for users. This includes:

This is all stuff that is currently supported and definitely used. There are probably other useful inclusions too. To have it all, but make it a bit nicer to use would be great.

JamesRamm commented 7 years ago
  • Allow reading/writing/updating of the different files.
  • Allow interrogating/updating/removing different parts of the files. It should also be possible to hand in specific scenario/event vals to change the way these are returned.

This shouldn't be a problem at all. With the temporary classes I added - Statement and ControlStructure, adding a new command would just be a case of creating a Statement instance and calling append on the root ControlStructure

Allow grouping of different components (gis files/control files/variables with certain commands etc)

Do you mean within the control file (like re-organising it or something) or something like filtering to get all parts of a certain type?

duncan-r commented 7 years ago

https://github.com/duncan-r/SHIP/issues/20#issuecomment-311733149 Sorry. I might not have been very clear here. I mean interrogating the model for different components, like this from SHIP/examples/tuflowgisandbcfiles.py

for ckey, cfile in tuflow_model.control_files.items():
        gis_files = cfile.files(filepart_type=ft.GIS)
        data_files = cfile.files(filepart_type=ft.DATA)

You can see more info on how this is done at moment here: http://duncan-r.github.io/SHIP/tuflow/controlfile.html#accessing-data

As example above http://duncan-r.github.io/SHIP/tuflow/controlfile.html#obtaining-specific-tuflowpart-s For command/variable: http://duncan-r.github.io/SHIP/tuflow/controlfile.html#finding-tuflowpart-s-containing For specific scenarios/events http://duncan-r.github.io/SHIP/tuflow/controlfile.html#filtering-by-scenarios-events

JamesRamm commented 7 years ago

:+1: I see. It would be good, if we are expecting large changes, to organise the public API by importing it into the __init__ file and perhaps losing a number of the tuflow prefixes since it is all namespaced by the top tuflow module anyway. Something along the lines of being able to do:

from ship import tuflow
tuflow.Model(...)
duncan-r commented 7 years ago

:I'm happy with this. I think that if we need some big changes to get everything in line we might as well get it done properly. We can then hopefully tightly define the public API. I'll start checking to see if there's anything in the fmp package that might want some tweaks while it's being done. Although it's definitely in better shape that the tuflow package.

It occurred to me that you might find the test model that's used for the integration tests useful for seeing how the loader is doing. You can find it in the repo here: https://github.com/duncan-r/SHIP/tree/develop/integration_tests/test_data/model1/tuflow

It is just a collection of .tcf/.tgc/.tbc/.trd files at the moment, but I've put it together to provide a good mix of things that cause problems. Such as if-else logic, defines, repeated commands, .trd files etc.

JamesRamm commented 7 years ago

Regarding https://github.com/duncan-r/SHIP/issues/20#issuecomment-311739915 I have added a filter method on the ControlStructure class which achieves much of the functionality of getFilePartType. Since all statements are now a single class, (more or less), I think (?) files/variables/logics are all covered here. I'm sure there is missing functionality, but I'm not too concerned with that till I start merging in the new stuff. You can carry out lines 45-53 of the tuflow example with the filter method, with the exception of filepath resolution.

On that subject, I am leaning toward using a tree structure (rather than the two separate classes) to represent the control file. Basically, merge Statement and ControlStructure (they are almost the same anyway) and these are the nodes in the tree. The ControlFile could either become just what the root node is referred to as, or a container class for the nodes. Likewise a TuflowModel could be that container class or another container (am I making sense...). This way, you could just set the model path on the root node and all the leaf nodes can refer back up the tree to get it if they need. Other control files that can be parsed (ecf, tgc...?) could also be nodes on the tree. This would greatly simplify interacting with and expanding the api, but would probably change it a lot. I imagine many of the method signatures could be preserved as 'wrapper' methods with deprecation warnings or something like that.

JamesRamm commented 7 years ago

Probably a good idea to keep this thread as a general/philosophical discussion of the API & architecture and leave discussion of implementation details to #2

duncan-r commented 7 years ago

That's good news. When we get a bit further with this I will try and hook up this branch to some of the code we have for evaluating and writing models and see how it gets on. It should be a fairly good benchmark.

I've continued the implementation discussion here: https://github.com/duncan-r/SHIP/issues/2#issuecomment-312187631

duncan-r commented 7 years ago

By this:

Likewise a TuflowModel could be that container class or another container (am I making sense...).

Do you mean that TuflowModel is almost just an access point for the API, similar to the current structure? This is sort of the current setup, I think? Where it holds the control files (although they are currently stored flat. I.e. it would basically have a number of public methods and contain the control file tree?

JamesRamm commented 7 years ago

Where it holds the control files (although they are currently stored flat. I.e. it would basically have a number of public methods and contain the control file tree?

Yes exactly; I was thinking how to fit in the current names of TuflowModel and ControlFile. With a tree I dont think it would make semantic sense to have both. E.g. TuflowModel could have a control_file(s) member, but I dont think it would make sense (although we could) to use ControlFile for the nodes of the tree - it would only make semantic sense for the root node (which represents the whole control file, or set of control files if we want to have a single tree for all), but child nodes would represent a part of a control file.

duncan-r commented 7 years ago

Kind of like this (please excuse the painful paint drawing :)), where '.shp' and '0.1' represent file and variable components of the control file? tuflow_tree_structure

with a tree I dont think it would make semantic sense to have both. E.g. TuflowModel could have a control_file(s)

Would it be sensible to have some class just to store the tree in (a TuflowModel) It may be useful to have a place for model wide methods that aren't specifically associated with the actual tree nodes? So the 'initial .tcf' would be a node like all of the other parts. The TuflowModel would just hold the whole thing and have no direct association with a particular component?

JamesRamm commented 7 years ago

@duncan-r Thats exactly right. Yes I think just put the tree an TuflowModel along with model wide methods.

JamesRamm commented 7 years ago

From https://github.com/duncan-r/SHIP/issues/2#issuecomment-31256681:

.i.e. keep it as TuflowModel. Any thoughts?

The API I had in mind was being able to do stuff like:

from ship import tuflow, fmp
tuflow.Model
tuflow.FilePartTypes
fmp.Model (# I've not yet looked at fmp module much so this could be totally wrong..)

and so on.. And also ideally:

from ship import loadFile
model = loadFile(...)

Of course it could be equally clear to do something like:

from ship import TuflowModel, FmpModel

1st option involves less typing (from a development point of view), 2nd option is closer to current syntax so no clear choice - whatever you feel is best!

I think that eventually having from ship import loadFile possible is most important, especially if the majority of the API is reachable through the model class (that is returned from loadFile).

duncan-r commented 7 years ago

Yes, this is what I was wondering. In some ways it's much clearer to always have them named the same. In others it's nice to have a clear separation.

I think that eventually having from ship import loadFile possible is most important

Completely agree. I personally find this really helpful. When loading multiple models you don't have to keep importing differently named loaders all over the place. It feels cleaner this way. I've often wondered whether it's a good idea to keep the specific loaders in a module separate from the rest of the code, but I think it works.

The API I had in mind was being able to do stuff like: fmp.Model (# I've not yet looked at fmp module much so this could be totally wrong..)

It's not as clear this one. At the moment the main fmp model is called a DatCollection. Mainly because it's a collection of .dat file components (RiverUnit, BridgeUnit, etc). The definition of model isn't as clear with FMP as it is with TUFLOW. You can have .dat files and .ied files which in some cases can be nearly the same thing (they can also contain some structures for example). There are .ief run files and .rgh files and diagnostic files ... The issue is they don't particularly rely on each other in the way that the tuflow files do. There is perhaps some potential to try and group these together in a "model" that might be useful. An FMP simulation required a minimum of a .dat file, .ief file and the output diagnostics, but often requires .ied files as well. I hadn't considered it before, but perhaps a high level container may help. I'm sure there's some collective functionality that could benefit from that setup.

With that approach in mind the Model approach would improve consistency for the API.