Use an advection estimate of the downstream flow of sewage from EDM

At this stage this is more of a discussion point but I think discussions are not yet enabled on the repo?

If there was a velocity estimate for the water relative the ground, then an estimate for the downstream "frontier" of the outflow could be made, rather than pessimistically assuming instantaneous propagation all the way to the tidal region.

We could also make an estimate of the "rear" of the outflow, coinciding with when the outflow was reported as stopped.

So far we are ignoring any concept of concentration of the outflow in the river water.
But we could apply a sort of density function, 1 during the outflow, 0 otherwise. Then we could use something like a repeated convolution which would have the effect "blurring" the ends (see diagram), to model the mixing over time and distance. I think this amounts to assuming instantaneous mixing of the sewage across a cross-section of the river; and calling this density 1 during the outflow period, and zero before and after, and then assuming some sort of diffusive mixing along the direction of flow.

Here's an example of the convolution with an offset used to represent flowing downstream (to the right) over time.

Combining Tributaries

Where tributary A & B merge into C, if A is polluted and B is not, then C will have a lower concentration than A.
A plausible scheme for combinging tributaries is: density(C) = (flow(A) x density (A) + flow(B) x density(B) ) / (flow(A) +flow(B)) Using $S$ for sewage density and $F$ for flow: $S_C \approx \frac{S_A F_A + S_B F_B}{F_A + F_B}$

There is apparently no volume information in the EDM data, just a binary indication of flowing / not flowing; nor any suggestion of concentration of sewage in the flow. So it isn't really meaningful to "add together" the concentration of one outflow with another, since one outflow may be 100x the volume and/or concentration of the other. However it might still be useful to add the notional densities together, because it gives an indication that multiple outflows are affecting a downstream stretch rather than just one.

Of course this is depending on a river of constant cross-section with "steady" flow whereas rivers have varying width and depth, and thereby cross-section, and differing velocity of water, along their lengths. However the total flow (in m^3/s) across any cross section is presumably roughly constant apart from (in a guesswork order of magnitude):

where tributaries merge - this needs to be directly modelled
where the river splits in two or more; the sum of the separate channels would be constant
rainfall and unmeasured, unmodelled run-off into the river.
sewage flowing into the river with rainwater
some loss due to evaporation, ground leakage, etc.

And of course differing velocity even at a given cross section (eg typically faster flow around the centre, slower at the edges); but this could be factored into the convolution density which blurs out or "diffuses" the outflow period. (I've used a gaussian density which is symmetric, but perhaps something left skewed would be more realistic, to model the pollution "sticking around" in the slower portions of the river.)

A few potential additional data sources

There is some real-time and historical flow data available here: Map Example Single Station

This amateur website comes up with estimates of mean flow speeds at various locations, using some assumptions about depth at peak flow vs typical flow... I think.

It would be very useful if there were some sort of real-time measurements of a proxy for some component of sewage... eg conductivity of the water. we could then relate that at different times to the various outflows. Advert for conductivity equipment relating it to water treatment.

There is an example here where "sondes" have been used to measure several aspects of water quality at 30 minute intervals: Example report including charts of conductivity etc of River Lea. It may be possible to assemble more of this historical data; there may be other examples; it may even be possible to beg updates from Thames... (The EA bathing water bacteria testing does not appear to be frequent enough in time or space on the Thames)

Great stuff, a lot of what you've mentioned can definitely be accounted for and in various ways are already used in hydrological/pollution monitoring. The challenge is to get this to operate in real-time on a large-ish model domain. Some pointers on this below.

Combining tributaries

So, I'm actually doing this first as I can speak about the "D8" grid which is the basis for easy simulation of drainage over large areas. At a mathematical level, hydrological flow over Earth's surface is modelled using the shallow water equations. In practice, implementing this equation over large areas is computationally quite expensive but obviously very important, for instance, for flood modelling. LISFLOOD is a good example of this kind of modelling.

A very common simplification that rapidly speeds up hydrological modelling is to use what is called the "D8 algorithm" [email me if you want a copy of any paper, sorry its not open access....] to build a simplified representation of water flow across a landscape. In short, if you imagine a digital elevation model made up of pixels, each "cell" in the domain is assumed to "donate" water/flow to exactly one of its neighbours, or itself (a "sink" node). For instance the simple example below taken from my lecture slides:

The D8 representation is useful because you've turned your drainage network into a directed acyclic graph which you can therefore perform mathematical operations on very efficiently. The drawback is you assume your network is fixed in time and also you can't model areas where flow 'diverges' (e.g., braided rivers). There are modified versions (e.g., 'D-infinity') that allow divergent flow, but for our purposes, the D8 algorithm and network representation is probably fine.

Currently, we use the D8 map of the UK generated by the Centre for Ecology & Hydrology. You can see how this is used to simulate flow using the D8Accumulator class that is part of the POOPy backend.

Dilution etc..

There is apparently no volume information in the EDM data, just a binary indication of flowing / not flowing; nor any suggestion of concentration of sewage in the flow. So it isn't really meaningful to "add together" the concentration of one outflow with another, since one outflow may be 100x the volume and/or concentration of the other. However it might still be useful to add the notional densities together, because it gives an indication that multiple outflows are affecting a downstream stretch rather than just one.

Agreed, in fact the ability to look at the instantaneous 'density' of CSO sources upstream is possible to calculate in POOPy (see here for the relevant code), and we are currently hoping to add this information to the visualisation upstream. See this Issue on the front-end repository. For the data limitation reasons you outline I think this information is heavily caveated, but still useful!

Simulating transport

I agree on convolution as a useful way of simulating transport as it allows you to consider advection & diffusion. For instance, see this article by Gaëlle Guillet. This approach is particularly useful when you're trying to model flow between two discrete points connected by a variably long stretch of river. Its also widely used in groundwater modelling to simulate flow between two places where there is available data (e.g., borehole locations). In this particular case, where we're trying to ultimately simulate over a fairly large continuous grid, quickly, I think the best approach is probably to try and model transport across the D8 network

Over large areas like the Thames, the diffusive component of pollution transport can typically be neglected. Formally, the Peclet number is very small and so you only need to worry about the advection aspect. I personally have never tried to simulate dynamic flow over a D8 grid, but it is definitely possible to do so! Indeed, the best example of this that I'm aware of is TopoFlow. This model, as far as I understand it, does simulate dynamic flow of water (easily extended to pollutant tracers if not already supported) over a D8 domain. I can't remember exactly how it works, but I think all of the code and documentation is open-access so could probably find out quite easily.

If you have a value of water "speed" across all pixels, you could then run a simple model that "updates" every timestep (calculated using the Courant condition) to propagate signals downstream.

Additional datasources

Probably, the best approach is to try and estimate water velocity at all points across the D8 grid using some kind of simple hydrological model (e.g., Manning's Formula). Initially keeping it simple and having a fixed velocity makes sense, updating that dynamically will be... harder. But, you can use these great amateur datasites (thank you for sharing, I love this kind of thing) to "benchmark" the estimates against I think.

Water quality sondes

Yes, the EA data is ok but nowhere near dense enough (you can see their distribution here) to be helpful for this kind of work (I have decided). However, in the next few years the number of sondes in rivers is going to skyrocket due to The Environment Act which obligates water companies to place sensors upstream & downstream of every storm overflow. So, that's nice to know is on the horizon!

AlexLipp / thames-sewage