Arbitrum Governance Forum Data Collection Strategy - On Chain and Structured

Pfed-prog commented 9 months ago

Given the vital insights and the overlap between #5 and #6 issues raised by me and poupou-web3, this new issue aims to address the challenges associated with insufficient data from delegates on the Arbitrum governance forum and the need for comprehensive data collection for analysis.

Proposed Tasks:

Data Source Identification:

Identify potential data sources for collecting token holder and participation data for governance proposals voting. Explore the utilization of tools such as Karma, Tally, and Snapshot for gathering relevant data.

Data Collection Pipeline Development:

Formulate a plan to build a structured pipeline for extracting and collating data from identified sources. Seek to leverage Python and Selenium for automated data collection to ensure efficiency and accuracy.

Data Quality Assurance:

Implement measures to ensure the quality and reliability of the collected data, including verification processes and error handling mechanisms.

Data Privacy and Compliance:

Address privacy and compliance concerns by outlining protocols for handling sensitive user data in accordance with regulatory standards and best practices.

Documentation and Reporting:

Develop documentation outlining the data collection methodology, tools utilized, and any associated challenges. Create a reporting framework to summarize the collected data and its readiness for further analysis.

Collaboration and Feedback:

This new issue aims to lay the groundwork for a robust data collection strategy that will feed into the comprehensive analysis of governance participation within the Arbitrum ecosystem. Collaboration and feedback from stakeholders and community members are encouraged to ensure a well-rounded and inclusive approach.

Engaging with the Community:

Explore ways to engage the community and garner their support for the data collection efforts, ensuring transparency and building trust in the process.

This issue serves as a crucial step towards enhancing the legitimacy and participation rates in on-chain voting and fostering a more inclusive and informed governance framework within the Arbitrum ecosystem.

Pfed-prog commented 9 months ago

Some data sources

https://forum.arbitrum.foundation/t/experimental-delegate-incentive-test-1/20944

https://dune.com/pandajackson42/arbitrum-delegates-and-voting-power

epowell101 commented 9 months ago

Thank you very much for this "meta issue" which addresses, as you point out. some of the overlap between issues. Very helpful and - at least to me - very clear!

poupou-web3 commented 9 months ago

Thanks for formulating this issue @Pfed-prog I would suggest anyone taking on that issue also to look at BeatifulSoup, or other scraping libraries.

The analysis suggested in https://github.com/OpenDataforWeb3/DataGrantsforARB/issues/6 goal was to extract data exclusively from the arbitrum forum, to analyze the influence on the forum even before any voting. So I'm a bit confused with the new title.

The on-chain part was on the issue https://github.com/OpenDataforWeb3/DataGrantsforARB/issues/7

Reusable Data Collection:

Develop a data collection system using the Snapshot GraphQL API (https://docs.snapshot.org/tools/api) designed for reusability. This involves creating a modular, well-documented codebase that can be easily adapted or updated. The system should extract key data elements such as voter identities, vote counts, and voting reasons in a format that is maintainable and scalable. Ensure capability for ongoing data collection to allow analysis of new proposals as they are introduced.

Mainly retrieving data from Snapshot using their API (I know this is centralized)

One suggestion would be to keep those issues for data collection around Abr governance and to ask for the retrieval of both on-chain and off-chain data.

jchanolm commented 8 months ago

Hi @poupou-web3 and @Pfed-prog I just realized that the datafiles for my Forum scraper and Snapshot didn't upload to the repo because they were too big.

Discourse

Snapshot

Datafile: ipfs://bafybeihu23kfmowefwj2ztolc42ajyx7gqd7i5wfvwnypvqhmel5hwrorm/
Path to script: https://github.com/jchanolm/arbitrum-data/tree/main/pipelines/scraping/snapshot

jchanolm commented 8 months ago

@poupou-web3 @epowell101 Sorry, I just realized I overwrote the Snapshot file with the Discourse file by saving with the same filename.

I re-uploaded both datafiles with the correct data.

Discourse

Snapshot

Datafile: ipfs://bafybeihu23kfmowefwj2ztolc42ajyx7gqd7i5wfvwnypvqhmel5hwrorm/
Path to script: https://github.com/jchanolm/arbitrum-data/tree/main/pipelines/scraping/snapshot

OpenDataforWeb3 / DataGrantsforARB