Implement FlowObject - Githubissues

wkearn commented 1 month ago

With TopoToolbox/libtopotoolbox#72, we should have everything that we need to create a FlowObject from a GridObject.

The basic workflow, which corresponds to TopoToolbox v2 FLOWobj(dem,'preprocess','carve'), would, in C, look something like:

fillsinks(filled_dem, dem, dims);

identifyflats(flats,filled_dem,dims);

gwdt_computecosts(costs, conncomps, flats, dem, filled_dem, dims);

gwdt(dist, prev, costs, flats, heap, back, dims);

flow_routing_d8_carve(source, direction, filled_dem, dist, flats, dims);

// Compute downstream target from source and direction

The results that we need to store in the FlowObject are the two arrays source, which is a topologically sorted list of source pixels in the flow network, and target, which is the corresponding downstream pixel for each source pixel.

At the moment flow_routing_d8_carve doesn't actually return the target array. Instead it fills in a direction array that encodes the flow direction at each pixel. I will add a libtopotoolbox function flow_routing_targets_from_direction or something like that to fill in a targets array from the sources and flow directions, and we can use that at the end.

wkearn commented 1 month ago

This is probably blocked by #53 because the newer libtopotoolbox functions only have the new dimensions interface.

Teschl commented 1 month ago

I'm working on #53 and #45 right now, so the FlowObject implementation can be added after that.

Teschl commented 1 month ago

There is a decision to make regarding the creation of a FlowObject.

We create a function in utils which needs a GridObject as an argument: Using this option, we could implement a version where DEM could also be the path to a TIF file, for example.
```
DEM = topotoolbox.load_dem('perfectworld')
FLOW = topooolbox.gen_FlowObject(DEM)
```

Or we add a function to the class GridObject:

DEM = topotoolbox.load_dem('perfectworld')
FLOW = DEM.gen_FlowObject()

Probably too convoluted but still possible:

DEM = topotoolbox.load_dem('perfectworld')
FLOW = topotoolbox.FlowObject.new_object(DEM)

Something else entirely

I think option two is the way to go, but this could lead to confusion, since using this solution will result in a completely different creation process between FlowObject and GridObject for the user.

wkearn commented 1 month ago

Is is possible to pass the GridObject to the FlowObject constructor? Something like FD = FlowObject(DEM) This is more or less how MATLAB TT does it.

There are a few different ways to construct a GridObject. I think there should only be one way to construct a FlowObject because a FlowObject only makes sense when referenced to an underlying GridObject.

We could create it directly from the file, as you suggested in option 1, but we still have to load the data from the file and do everything we already do for GridObject, so it doesn't really save us any work.

If making the FlowObject constructor take a GridObject doesn't really work, I think I prefer option 2, the method on GridObject, because it makes the dependency between the FlowObject and GridObject clear. We might call the method something different, like GridObject.route_flow to make it more clear what we are actually doing.

Teschl commented 4 weeks ago

Yeah, it's possible to implement it like this:

DEM = topotoolbox.load_dem('perfectworld')
FD = topotoolbox.FlowObject(DEM)

Also, adding opinion two will be easy since it's just a wrapper for the constructor.

The way I'm implementing this right now is just calling the PyBind11 wrapper functions in the constructor of the GridObject without creating a GridObject function for each. For example, using gwdt as a python function that returns a new GridObject is unnecessary when using it in the constructor where only the np.ndarray is needed.

Question regarding datatype and layout: Regarding the different variables used to create a FlowObject. It is hard to keep track of what variable should be which datatype and size. Below I have listed what I'm currently using:

fillsinks(filled_dem, dem, dims)
# filled_dem: size/layout of dem and dtype=float (32?)

identifyflats(flats,filled_dem,dims)
# flats: size/layout of dem and dtype=int32

gwdt_computecosts(costs, conncomps, flats, dem, filled_dem, dims)
# costs: size/layout of dem and dtype=float (64?)
# conncomps: size/layout of dem and dtype=int64

gwdt(dist, prev, costs, flats, heap, back, dims)
# dist: size/layout of dem and dtype=float (32?)
# prev: size/layout of dem and dtype=int64
# heap: size/layout of dem and dtype=int64
# back: size/layout of dem and dtype=int64

flow_routing_d8_carve(source, direction, filled_dem, dist, flats, dims)
# source: size/layout of dem and dtype=int64
# direction: size/layout of dem and dtype=uint8

flow_routing_targets(target, source, direction, dims)
# target: what size/layout should this be? and dtype int64

Could you give some clarification on the dtypes and layout/size of the different variables @wkearn ? I want to make sure it's fine to just copy dem but filled with zeros when using the same size/layout. Should some variables be filled with something other than zero when calling the different functions?

wkearn commented 4 weeks ago

Yeah, it's possible to implement it like this:
DEM = topotoolbox.load_dem('perfectworld')
FD = topotoolbox.FlowObject(DEM)
Also, adding opinion two will be easy since it's just a wrapper for the constructor.

The way I'm implementing this right now is just calling the PyBind11 wrapper functions in the constructor of the GridObject without creating a GridObject function for each. For example, using gwdt as a python function that returns a new GridObject is unnecessary when using it in the constructor where only the np.ndarray is needed.

This makes sense

Question regarding datatype and layout: Regarding the different variables used to create a FlowObject. It is hard to keep track of what variable should be which datatype and size. Below I have listed what I'm currently using:
fillsinks(filled_dem, dem, dims)
# filled_dem: size/layout of dem and dtype=float (32?)

All float arrays should be float32

identifyflats(flats,filled_dem,dims)
# flats: size/layout of dem and dtype=int32

This is correct

gwdt_computecosts(costs, conncomps, flats, dem, filled_dem, dims)
# costs: size/layout of dem and dtype=float (64?)
# conncomps: size/layout of dem and dtype=int64

float32, but otherwise correct.

gwdt(dist, prev, costs, flats, heap, back, dims)
# dist: size/layout of dem and dtype=float (32?)
# prev: size/layout of dem and dtype=int64
# heap: size/layout of dem and dtype=int64
# back: size/layout of dem and dtype=int64

This is correct (float32).

flow_routing_d8_carve(source, direction, filled_dem, dist, flats, dims)
# source: size/layout of dem and dtype=int64
# direction: size/layout of dem and dtype=uint8

This is correct.

flow_routing_targets(target, source, direction, dims)
# target: what size/layout should this be? and dtype int64

The size and layout of target should be the same as source, and it should be int64.

Is there a better way to present this information (the data types and size/layout of each argument)? Technically it is in the API documentation, but that is maybe not as clear as I would like it to be.

Could you give some clarification on the dtypes and layout/size of the different variables @wkearn ? I want to make sure it's fine to just copy dem but filled with zeros when using the same size/layout. Should some variables be filled with something other than zero when calling the different functions?

All libtopotoolbox functions should initialize the arrays you pass to them, so it shouldn't matter what they are filled with. So you don't technically need to fill an array with zeros, if NumPy gives you a way to just allocate the memory without zeroing it. You could also reuse some intermediate arrays, such as the int64 arrays.

If you notice something doesn't seem to work properly without initializing it in a particular way, let me know, because that is probably a bug in libtopotoolbox.

Teschl commented 4 weeks ago

Ok, thanks for the clarification.

Is there a better way to present this information?

I think the only reason this was not immediately obvious to me was because it's just the pointer that is passed to each function. Knowing that every array has the same shape solves this issue. Maybe referencing that in the documentation, along with the expected contents of each array, will clear things up. Like in excesstopography for example:

threshold_slopes – [in] The threshold slopes (tangent of the critical angle) at each grid cell. A float array of size (dims[0] x dims[1]).

You could also reuse some intermediate arrays, such as the int64 arrays.

This seems like a great way to reduce memory usage, I'll look into that!

TopoToolbox / pytopotoolbox

Implement FlowObject #60