OpenWaterAnalytics / EPANET

The Water Distribution System Hydraulic and Water Quality Analysis Toolkit
MIT License
272 stars 201 forks source link

efficiency of network editing functions #762

Open jamesuber opened 7 months ago

jamesuber commented 7 months ago

I've had the chance the put the network editing functions through their paces using a network skeletonization use-case for large network models. This uses epanet as the backend store of network knowledge, and uses a boost graph representation of the network to support the skeletonization process. As changes are made in the network links and nodes, those changes are sent to epanet through the model editing api functions.

So this is an application where you are doing a lot of model editing. Very unlike support for model editing features in the UI. I was surprised to find that over 95% of the cpu time was spent in epanet, cause I figured all the heavy lifts would be in the graph skeletonization algorithms.

Talking to @LRossman about this, he thinks that the time is probably being spent on memory re-allocation. So this ticket is asking whether those memory requests can be streamlined somehow.

jamesuber commented 7 months ago

by the way, I don't think that network skeletonization is the only use case that would be hit by this. Another good example is using the epanet API to build models from scratch GIS data sources.

LRossman commented 7 months ago

The potential bottlenecks might be the EN_addnode, EN_deletenode, EN_addlink, and EN_deletelinkfunctions. The EN_add... functions require a total reallocation of the Nodeor Linkstruct arrays every time just a single node or link is added to a project. One way to avoid this is to add a Capacity property to these structs. When a new node or link is added and the total number is below the Capacity then a reallocation is not needed. Once the number of nodes/links reaches Capacity then the latter is increased by some amount and a new reallocation is made for this new Capacity. This same strategy is currently used with Curveelements (see the declaration of Scurvein types.h and the resizecurvefunction in project.c.

EN_addnodealso reallocates the NodeDemand, NodeQual, NodeHead, DemandFlow, and EmitterFlowarrays whenever a new node is added. These arrays are only used during an analysis run and can be reallocated just once when an analysis is begun. Likewise, EN_addlinkreallocates LinkFlow, LinkSetting, and LinkStatuswhen a new link is added. These re-allocations can also be postponed until a new analysis is begun.

Regarding the EN_deletenodefunction, it requires shifting the indices of the Nodearray and the indices stored in the NodeHashtabledown by one whenever a single node at position index is deleted. The code for doing this looks as follows:

    for (i = index; i <= net->Nnodes - 1; i++)
    {
        net->Node[i] = net->Node[i + 1];
        // ... update node's entry in the hash table
        hashtable_update(net->NodeHashTable, net->Node[i].ID, i);
    }

The hashtable_updatefunction has to search the table for each node who's position is above index. It might be more efficient to move this function out of the loop and replace it with the following:

    void hashtable_shiftdown(HashTable *ht, int index)
    {
        DataEntry *entry, *nextentry;
        int i;
        for (i = 0; i < HASHTABLEMAXSIZE; i++)
        {
            entry = ht[i];
            while (entry != NULL)
            {
                if (entry->data > index) (entry->data)--;
                entry = entry->next;
            }
        }
    }

One potential downside of this new function is that HASHTABLEMAXSIZEis currently set to 12800 (with most of these entries set to NULL for small to moderate sized networks), so it remains to be seen if this approach is more efficient. If so then a similar replacement for the hashtable_updatefunction called in EN_deletelinkshould be made.

jamesuber commented 7 months ago

Very useful analysis as usual Lew. I don’t know this is a priority, but I think the approach you laid out sound great, especially for the addition functions.

James Uber, Ph.D. Director, Business Incubation Xylem Data Science M: (513) 368-6303

From: Lew Rossman @.> Date: Wednesday, November 22, 2023 at 3:03 PM To: OpenWaterAnalytics/EPANET @.> Cc: Uber, James - Xylem @.>, Author @.> Subject: Re: [OpenWaterAnalytics/EPANET] efficiency of network editing functions (Issue #762)

The potential bottlenecks might be the EN_addnode, EN_deletenode, EN_addlink, and EN_deletelink functions. The EN_add... functions require a total reallocation of the Node or Link struct arrays every time just a single node or link is added to a project. One way to avoid this is to add a Capacity property to these structs. When a new node or link is added and the total number is below the Capacity then a reallocation is not needed. Once the number of nodes/links reaches Capacity then the latter is increased by some amount and a new reallocation is made for this new Capacity. This same strategy is currently used with Curve elements (see the declaration of Scurve in types.h and the resizecurve function in project.c.

EN_addnode also reallocates the NodeDemand, NodeQual, NodeHead, DemandFlow, and EmitterFlow arrays whenever a new node is added. These arrays are only used during an analysis run and can be reallocated just once when an analysis is begun. Likewise, EN_addlink reallocates LinkFlow, LinkSetting, and LinkStatus when a new link is added. These re-allocations can also be postponed until a new analysis is begun.

Regarding the EN_deletenode function, it requires shifting the indices of the Node array and the indices stored in the NodeHashtable down by one whenever a single node at position index is deleted. The code for doing this looks as follows:

for (i = index; i <= net->Nnodes - 1; i++)

{

    net->Node[i] = net->Node[i + 1];

    // ... update node's entry in the hash table

    hashtable_update(net->NodeHashTable, net->Node[i].ID, i);

}

The hashtable_update function has to search the table for each node who's position is above index. It might be more efficient to move this function out of the loop and replace it with the following:

void hashtable_shiftdown(HashTable *ht, int index)

{

    DataEntry *entry, *nextentry;

    int i;

    for (i = 0; i < HASHTABLEMAXSIZE; i++)

    {

        entry = ht[i];

        while (entry != NULL)

        {

            if (entry->data > index) (entry->data)--;

            entry = entry->next;

        }

    }

}

One potential downside of this new function is that HASHTABLEMAXSIZE is currently set to 12800 (with most of these entries set to NULL for small to moderate sized networks), so it remains to be seen if this approach is more efficient. If so then a similar replacement for the hashtable_update function called in EN_deletelink should be made.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/OpenWaterAnalytics/EPANET/issues/762*issuecomment-1823436779__;Iw!!OKzgfr8!ZQeZb4C_W2krxYejnpLdJBSWXZe8PbC0zq1bNElvQSwPg4Spp_v-jyABkT3s3SOYg8P0YCTW_eMSmYe2vG_Ta-oKInM$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AASMACIFZTQJ6MLVIIYNPDLYFZLBDAVCNFSM6AAAAAA7WTLC4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRTGQZTMNZXHE__;!!OKzgfr8!ZQeZb4C_W2krxYejnpLdJBSWXZe8PbC0zq1bNElvQSwPg4Spp_v-jyABkT3s3SOYg8P0YCTW_eMSmYe2vG_T68PpETA$. You are receiving this because you authored the thread.Message ID: @.***>

CONFIDENTIALITY NOTICE: This e-mail, including any attachments and/or linked documents, is intended for the sole use of the intended addressee and may contain information that is privileged, confidential, proprietary, or otherwise protected by law. Any unauthorized review, dissemination, distribution, or copying is prohibited. If you have received this communication in error, please contact the original sender immediately by reply email and destroy all copies of the original message and any attachments. Please note that any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Xylem Inc..

LRossman commented 7 months ago

@jamesuber I pushed a new branch to devnamed dev-reduced_reallocs that only re-allocates space once every 50 times when EN_addnodeor EN_addlinkare called. (You can change the '50' to something else at the top of epanet.c.) If you want to try it out on your project let us know if it reduces your run time by any substantial amount.

LRossman commented 7 months ago

I ran some performance tests on the dev-reduced_reallocsbranch. For creating 100,000 junction nodes there was a 27% reduction in run time for a re-allocation capacity of 50, a 63% reduction for a capacity of 100, and a 91% reduction for a capacity of 500 as compared to the original EPANET (which has a re-allocation capacity of 1). This suggests that for optimal performance the re-allocation capacity factor should be made adjustable based on some estimate of the number of nodes to be added using the EN_addnodefunction (with the same holding true for EN_addlink).

jamesuber commented 7 months ago

Thanks @LRossman -- I will plan to run some network skeletonization use-case tests in order to get some results from a practical application to add to yours. Will report back.