LucasTomazini / GraphLeak

0 stars 0 forks source link

GraphLeak: A realistic dataset to detect and locate leaks in water distribution networks

This repository contains the GraphLeak dataset, a comprehensive dataset designed for locating and identifying leaks in water distribution networks (WDN). The dataset is intended to support researchers in developing and evaluating water leak detection models, particularly those utilizing deep learning techniques.

Abstract

The management of water resources and the reduction of water losses due to leaks are crucial for human life and industrial processes. To improve the efficiency of leak detection algorithms, a realistic dataset with reliable values is essential. GraphLeak is a dataset created through realistic simulations using the EPANET-MATLAB toolkit. It includes various WDN scenarios and topologies, with each node representing a measurement point within the network.

Index Terms

Dataset Description

Deep learning algorithms rely on high-quality data for accurate training and evaluation. GraphLeak provides a comprehensive dataset in tabular format, where each column represents a specific variable measured by individual sensors. The dataset includes information on pressure, flow, volume, label, and localization. The simulations are conducted using the EPANET WDN modeling software, and the datasets are exported to CSV (Comma-Separated Values) files.

WDS_topologie

Evaluation

The results obtained by a Multi-layer Perceptron are evaluated by the ain classification metrics of confusion matrix, such as accuracy, precision, reacall and F1-score.

The Mean Absolute Error (MAPE) is used to analyze the error between predictions and the correct values.

Raw Data Download

All the contents of GraphLeak are public and can be acessed here

PreProcess python file

Prerequisites

Data generation

From raw Data, generate the dataset by running:

 python3 main.py 
### Configurations **Meansurements Content** - You can choose which measure values contain in the dataset - Pressure: True or False - Flow: True or False - Volume: True or False **Noise** - If you want a Gaussian noise in the data, set noise as True. - Noise: True **Noise specification** - If there is noise in the data, specify the configuration bellow: - mu: 0 mean default - sigma: 0.1 standard deviation default **Nodes Normalization** - Set True (recommended) to normalize values between nodes. - Node_normalization: True **Data Normalization** - Set True (recommended) to normalize values in the range 0 to 1. - Data_normalization: True # License # Authors Lucas Roberto Tomazini; Rodrigo Pita Rolle; Alexandre da Silva Simões; Esther Luna Colombini; Eduardo Paciência Godoy; # Citation Please cite one of the following papers if you use this code for your researches:
@article{xx,
  title={GraphLeak: A realistic dataset to detect and locate leaks in water distribution networks},
  author={xx},
  journal={xx},
  volume={xx},
  pages={xx},
  year={xx},
  publisher={xx}
}