SWOT-Confluence / datagen

Generates JSON files that serve as input to the Confluence workflow.
Apache License 2.0
0 stars 0 forks source link

Set finder sub optimality #9

Open mikedurand opened 1 year ago

mikedurand commented 1 year ago

The current set finder is sub-optimal .

  1. It takes a while to run: Europe ran for several hours.
  2. Some reaches are included multiple times

These issues do not affect MetroMan or at least I'm not concerned about them presently.

These could all be fixed, in future. Input welcome!

mikedurand commented 1 year ago

On 1 - could improve performance by checking for set overlap at L2 basin scale, instead of for all reaches. It's the permuting all the reaches being memory intensive that leads to poor performance, I presume.

Note - MetroMan runs in just a few seconds. It's the work to go and remove sets with large degree of overlap that takes a while. It's coded inefficiently. MetroMan does not need this, due to its lower maximum number of allowed reaches.

mikedurand commented 1 year ago

On 2 - I don't think this is a huge issue, but happy to discuss further. For example, could simply remove any set with a duplicate reach, and add all remaining reaches as single-reach sets. Or could handle by just setting the allowed overlap very low.

mikedurand commented 2 months ago

@nikki-t these are a few old notes on the inefficiency of the set finder. not looking like I'll be able to fix all these on this iteration, but just flagging for you so you're aware.