dewi-alliance / grants

Details of the DeWi Alliance Grant Program
41 stars 15 forks source link

Helium data analysis #32

Open bigdaveakers opened 2 years ago

bigdaveakers commented 2 years ago

Project:

Helium data analysis - HeliumAnalysis.io (to be secured)

Elevator Pitch:

The project seeks to provide consolidated data and analysis of network performance and statistics in an easy to understand format. By taking various existing and newly created reports it will build to provide clear and concise analysis of all aspects of performance on a daily/weekly/monthly schedule.

Total fiat/hnt ask:

$80k

Name and Address:

David Akers

https://github.com/bigdaveakers

I have a keen interest in data analysis and have previously worked to expose large scale anomalies within the blockchain of another crypto project where millions of coins were minted illegitimately. Through detailed analysis these were traced through many transactions back to a handful of wallets on a couple of exchanges. For the last 3 months I have been looking at the data available on Metabase and have created a number of reports and dashboards that have been published to aid the understanding of rewards in particular. Through this I have been able to identify patterns and drill in to the data to investigate macro issues affecting the network and earnings. I have also more recently looked at data around the denylist and been helping community members with specific questions they would like to see in Metabase.

Some examples of the Dashboards I have created

Rewards Distribution Analysis https://etl.dewi.org/public/dashboard/12ec97c7-072c-470c-aef5-3979cf0e328c

Deny List Data ** Currently broken due to table name changes https://etl.dewi.org/public/dashboard/f2c08d16-8a89-4910-b581-feae8f3a77e8

Robert Putt

Github: https://github.com/robputt

Project Details:

With the dramatic rise in hotspot users and the fluctuations experienced in rewards this project aims to provide data and analysis to the community in order to help create a better understanding of the contributing factors.

By using reports already available in the dewi etl as well as custom reports the aim is to consolidate data to identify the factors that contribute to network performance and rewards. This data will not only be presented to users but will be accompanied by commentary and analysis explaining the reasons for the fluctuations on a global, regional, maker, etc level. It should be noted that this is not a proposal to provide a highly available ETL as a replacement for Metabase, but instead to provide highly available regularly updated static reports that are both visualised and machine readable.

The plan is to stand up a dedicated private ETL for use only by the project.

The data and summaries will be presented daily on HeliumAnalysis.io as well as on social media platforms.

At the end of each week an article will be written describing the state of the network and identifying any known issues or anomalies.

At the end of each month a report will be submitted to Dewi with the findings of the analysis for publication.

Over the last few months analysis of what is available in the dewi etl has shown that some reports give data that is misleading and in some cases wrong. By ensuring that the reports are maintained and correct as well as supplementing them with additional context it is anticipated that the community can be better informed and place less burden on the core team with common questions particularly relating to rewards.

As an example, the reports that have been created showing the distribution of rewards can be consolidated with the daily HNT production and beacons per day reports to identify fluctuations of rewards based on network performance.

Additionally reports can be generated from the data to identify potentially anomalous behaviour whether it be attempts to game the system or to identify groups of hotspots that may be experiencing issues.

Note, it is not the intention to analyse individual hotspots but the queries shall be designed in such a way that they are flexible for the community to be able to look at data for their own hotspots using other tools.

To engage the community it is proposed to provide a bounty system where people can submit new queries, ideas to be implemented, optimisations to existing queries, analysis and commentary.

Roadmap:

The roadmap is broken in to 3 main sections:

Infrastructure and analysis setup, MS1

Reporting Phase 1, MS2

Reporting Phase 2, MS4

With an additional item MS3 for providing small bounty payments to community members that submit useful contributions during the reporting phase.

It is anticipated that upon award funding will be made available for infrastructure setup (MS1). Initially this will be used to stand up a basic website with daily reports being provided as soon as possible based on already existing data from Metabase. These will likely be based on captures from Metabase and image based with commentary prior to the full design and implementation of the Website and migration of queries and visualisation. Development will continue to take place to provide a more user friendly UI and to design presentable visualisations, including migrating queries from Metabase to a private ETL. It is expected to be fully functional within 2 months from award.

Beyond this 2 milestones (MS2 and MS4) will be made up of deliverables including monthly reports that are presented to DeWi on the 1st of each month (or to suit the DeWi comms cycle) for the previous months data. These payments will also cover the interim daily and monthly reports that are used to educate and inform the community as well as any updates and additions requested on an ongoing basis by DeWi. It should be noted that the proposal is bound by hours estimates and while these are somewhat flexible it can not be expected for additional requests to be unachievable within these timescales.

Finally bounty payments (MS3) will be made to community members that provide support during the project. For example highlighting interesting issues, support with creating queries and visualisation, suggestions for improvements etc. Payments will be made, recorded and reported in a transparent way to both DeWi and the community to ensure full accountability and traceability.

MS/Roadmap:

Milestone + Date Deliverable Summary Cost
MS1, Award Infrastructure, Servers, and initial setup Setup of servers and infrastructure for serving data and storing reports for 12 months as well as tools and licences for software. Design and create website UI for presenting reports and visualisations. Design and create initial data queries to analyse performance data - estimated 150 hours + costs 30,000 USD
MS2, Award + 2 months Monthly Reporting Report monthly for initial 6 months to DeWi Foundation covering the monthly performance data and any emerging issues. Includes weekly written reports published on the website and social media as well as daily updates with a brief summary - estimated 200 hours 20,000 USD
MS3, Award + 3 months Community Bounty Community Bounty. Distribution of funds to community members for contributions 10x$100 bounties per month months 3 to 12 10,000 USD
MS4, Award + 8 months Monthly Reporting Report monthly for additional 6 months to DeWi Foundation covering the monthly performance data and any emerging issues. Includes weekly written reports published on the website and social media as well as daily updates with a brief summary - estimated 200 hours 20,000 USD
bigdaveakers commented 2 years ago

Grant request withdrawn due to lack of interest from DeWi

bigdaveakers commented 2 years ago

Reopened after discussion with @jamiew

bigdaveakers commented 2 years ago

Updated per discussion with @JessmFromEarth

tdemarchin commented 2 years ago

Dear Everyone, I am a data scientist and I'd like to write an article/tutorial on data science with R on Helium data usage (see my blog https://tdemarchin.medium.com/). My first question is about the data usage of the network (IoT data transfer). I would start first by plotting the evolution of data usage on a map (from the beginning up to now) and then do some statistics on the data.
For this, I need historical data, a lot of them. These data are available on https://etl.dewi.org/ but download is limited to 10^6 rows only and that makes just 4 days of historical data. An alternative would be to set-up my own ETL and I thought of asking for a grant to finance my setup (SSDs and a computer). Having gone through the discussion on Discord with @jamiew, @bigdaveakers and other Dewi affiliates, I understand the best approach would be to join the existing grant application above. The server requested in it would perfectly match my needs, I would just need a direct access to the database. This would avoid repeated costs for DeWi. Besides, by joining this grant, I would contribute my expertise in data science to the team's effort beyond data usage and reward. I look forward to hearing from you.

JessmFromEarth commented 2 years ago

tdemarchin

tdemarchin

Please shoot me a DM (JessM#3086) on Discord to discuss further thanks for the message

Scottsigel commented 2 years ago

Grant has been approved and signed off from the Foundation to begin work.

tomtobback commented 2 years ago

Hi, i've done some related work analyzing real data traffic on the network, which could be integrated into real-time dashboards. Can you confirm if you plan to cover this aspect, otherwise i would consider applying for a grant to elaborate on this. It is mentioned as an example of batch1 but i cannot find any other related approved project, and i'm not sure from above if @tdemarchin joined this grant.

tdemarchin commented 2 years ago

Hi @tomtobback . First of all, nice article. I am also working on analyzing Helium data usage. I am finalizing an article and I hope to publish it in the next 2 weeks. I cover similar aspects than you do in your article but I do it differently. I would love to hear your feedback once it is published. Regarding this post, I initially thought working with @bigdaveakers to have access to the data I needed for my article. Finally, I got the data in another way so we didn't work together (yet?) but I am still interested in participating in a project. Let me know if you apply for a grant and look for team members.

tomtobback commented 2 years ago

Hi @tdemarchin - I found your article, very interesting, nice graphs. Indeed a different approach, you are looking at the full history while i'm looking at recent 3 to 7 day intervals. In my humble opinion, the way you report 'transaction data' is a bit unfortunate; what you are doing is grouping all data packets from a particular hotspot per block, which is a blockchain transaction, but does not mean an individual data transaction, as in a LoRaWAN packet sent from device to network. Each LoRaWAN data packet is very small (max 255 bytes); your example list has one transaction with 29,760 bytes: those bytes must have been spread over 100s of data transfers of that particular hotspot during that block. You arrive at a total number of 'transactions' of around 76M (rows in your table), but that does not reflect actual data packets, which were around 10M/day when i last looked (Feb 2022). @bigdaveakers I'm still interested in contributing to your heliumanalytics.io regarding real data traffic, or feel free to adapt and use my code linked above.

tdemarchin commented 2 years ago

Hi @tomtobback, I am open to feedback and can still adjust the article if needed. I originally posted the article on the #data-analysis discord channel before publishing it on social networks, hoping someone would review it. Nobody did it as carefully as you did, thanks!
It looks like for you, one transaction equals one data packet. My definition is a bit different, I see it as one data transfer between one hotpot and one connected device and this can be spread into several data packet if this exceeds 24 bytes (hence the transaction with 29,760 bytes). The problem is that transactions recorded on the blockchain are aggregate of all the transactions made by individual hotspot per block. If Hotspot A transfer 32 data packets with Device 1 and 10 with Device 2 in the same block, it will be recorded as one transaction of 42 packets on the blockchain. My approach would then see that as one transaction while there were actually two. That being said, the current target block time is 60 seconds. I assume it is quite rare for one hotspot to connect to several devices within that timeframe, except maybe for hotspots in very dense area (i.e. big cities with many devices). So at the and, my solution is not perfect but not that far from the truth I believe. Btw, where did you get that number of 10M/day? I would be curious to see how it is calculated.

tomtobback commented 2 years ago

Hi @tdemarchin the blockchain API for state-channels reports the number of packets, you can have a look at my code, or from another source: https://web3index.org/helium shows a weekly demand side (= data traffic) value of around $2k in Feb 2022, that is about $300 per day, or 30M DC, or 10-20M packets per day.

tdemarchin commented 2 years ago

Hi again @tomtobback, my understanding is that the weekly fee does not only refer to data transfer, sending money between wallets also generate transaction fees for instance. But still, one data exchange between a hotspot and a connected device often translates into several data packet transfers so I would not count one data packet as one transaction.