dewi-alliance / grants

Details of the DeWi Alliance Grant Program
41 stars 15 forks source link

Graph-Based Modeling for Anti-Gaming and Coverage Analysis #23

Open evandiewald opened 2 years ago

evandiewald commented 2 years ago

Project:

Adaptive Network Modeling using Graph-Based Representations

Elevator Pitch:

Helium's Blockchain API is an effective way to view historical data stored on-chain, but the ledger-based format is less useful for feeding directly into network models. In this project, we propose to build a framework for a graph-based representation of blockchain activity, including Proof of Coverage and Token Flow. By capturing the natural adjacency between hotspots and accounts, we will be able to build machine learning models to, for instance, identify likely "gaming" behavior and predict coverage maps based on hotspot placement.

Total fiat/hnt ask:

18750 USD

Name and Address:

Please provide your legal name and a link to the submitted issue to grants@dewi.org. This will streamline the contract process and KYC. A lack of this information will delay the contract.

Team or projects social: (optional)

LinkedIn

About the Applicant:

Evan is a graduate student with years of experience applying machine learning to messy datasets. A longtime member of the Helium Ecosystem, his team won the Grand Prize in the Hackster.io #IoTForGood contest for their predictive beehive monitoring system. He also maintains py-helium-console-client, a Python wrapper for the Console HTTP API. Evan fully embraces open source development and documents his projects in Medium publications like Towards Data Science and Better Programming.

Github (evandiewald)

Project Details:

The goal of this project is to create a dynamic, graph-based representation of the Helium Network and develop a preliminary suite of real-time analysis tools to characterize concepts like token flow, coverage mapping, and anomalous hotspot activity. Because Network Graphs natively capture the adjacency between nodes, they are widely used in a variety of applications, including search engines, social media platforms, and even biology. This data structure is also advantageous for the Helium Blockchain, which contains a number of connected elements, such as:

With this representation in place, we can leverage decades of research in graph theory to extract insights about network behavior. For example, Betweenness Centrality, which uses shortest path metrics to identify the nodes that uniquely connect disparate portions of a graph, has been used to identify Reddit communities with the most influence on pop culture. In the context of Proof of Coverage, betweenness can help us find the hotspots that - through witness paths - connect distinct neighborhoods in a city (see below).

Betweenness in Pittsburgh, PA

In addition to position, we can also apply relevant features to each node, such as local elevation and PoCv11 antenna characteristics, as well as each edge, like the reported RSSI of that witness path. As demonstrated in this blog post, we can use these features to train Graph Neural Networks for the purpose of, for instance, anomaly detection and predictive modeling.

The interpretability of Proof of Coverage is a double-edged sword. On one hand, mining rewards incentivize productive participants to optimize network coverage through well-defined criteria for hotspot placement and configuration. However, these rules also provide convenient thresholds for malicious actors to work around. Alternatively, AI-based approaches can be used to identify nonlinear decision boundaries that are more difficult to circumvent. They also have the benefit of real-time optimization when trained on continuously-evolving datasets. While we are not proposing that such a scheme be implemented in the core consensus protocol, it may be useful for analytics, including gaming detection and predictive modeling. For example, given a certain layout of hotspots in a region, what can we expect the coverage map to look like?

From the perspective of Helium's economics, graphs can also inherently capture concepts like token flow between wallets and exchanges, as well as hotspot ownership. While this information can be extracted from the official Helium API, by storing the data in a native graph database platform (such as the open-source ArangoDB), adjacency is expressed directly, which simplifies analytics and visualization tools.

Technical Objectives:

Roadmap:

Milestone + Date Deliverable Summary Cost
MS1, Dec. 7, 2021 Graph Database Importer Automated pipeline for generating and importing network graphs to ArangoDB database. Dataset will include adjacencies between hotspots and accounts, using features contained on-chain. Estimated at 25 developer hours. 3125 USD
MS2, Dec. 14, 2021 API + Deep Learning Toolkit (alpha) v1 API to support common queries of the database, as well as initial Python package for converting results into analysis-friendly formats. This phase will include an investigation of the feasibility of embedding off-chain data, such as elevation/geographic features. Estimated at 25 developer hours. 3125 USD
MS3, Dec. 21, 2021 Model Building Pt. 1 Preliminary real-time Graph NN-based model(s) for anomaly detection. Goal will be to estimate the percentage of "gaming" activity on the network, by number of hotspots and by rewards. Estimated at 20 developer hours. 2500 USD
MS4, Jan. 10, 2022 Model Building Pt. 2 Preliminary model for coverage mapping. Goal will be to predict coverage (if feasible from limited mapper data at this time) and/or estimated rewards by hotspot given a certain configuration, which will (ideally) aid PoC optimization efforts. Estimated at 20 developer hours. 2500 USD
MS5, Jan. 17, 2022 Model Building Pt. 3 Real-time token flow analysics tool. Will focus on aggregate movement to and from exchanges, as well as significant transfers. Estimated at 15 developer hours. 1875 USD
MS6, Jan 27, 2022 Dashboard Basic web-based dashboard showing real-time metrics for anomaly detection & token flow models. Visual demo of coverage mapping prediction as function of hotspot placement. Estimated at 25 developer hours. 3125 USD
MS7, Feb 7, 2022 Final Deliverables Open-source repository of analysis tools + Medium article(s) describing the completed work and instructions on how the community can access the dataset to create future models/tools. Estimated at 20 developer hours. 2500 USD