OpenDataforWeb3 / DataGrantsforARB

Organizing work as outlined in the Data Grants proposal to ThankARB and the ARB DAO in Q4 of 2023
Apache License 2.0
3 stars 1 forks source link

Arbitrum On-Chain Behavior Segmentation #21

Closed omnianalytics closed 6 months ago

omnianalytics commented 7 months ago

Objective: The ability to observe, track and categorize on-chain behavior would aid the Arbitrum eco-system in knowing who and how users are engaging with the blockchain. This insight could help shape the feature road map, tailoring the L2 chain’s development towards the actual users of the chain. Whether it be gaming or DeFi or payments; having an in-depth segmentation of the user behavior on chain would enable data driven research, development and production.

Data: Arbitrum Blockchain data pre-processed for easier analysis Methodology: Analysts would be required to pull on chain data from each block to create a super set of users on the blockchain. With this set, addresses would then be labelled as either EOAs or contracts. From there, the analyst would choose the segmentation variables which could include variables such as the wallet balance, number of tokens held, number of transactions, the number of other chains they have been active on, the max balance held, largest deposit, etc. These features would be collected and fed into an unsupervised learning algorithm to determine meaningful clusters. It is quite possible for the analysts to take a visual approach and predefine segments based on the findings to ensure the resulting segments have meaningful interpretations. Parallel coordinate plots, mean vectors and frequency tables could then be produced to share with the community the prevalence of each group with the Arbitrum eco-system.

Deliverable: A dataset with wallet addresses that have interacted with the Aribtrum eco-system with their behavioral segments labelled, an analysis of the profiles, their interpretations and incidences, as well as a classification model that can assign a label to a set of wallets for later batch processing.

poupou-web3 commented 7 months ago

Are you asking for labeling any addresses that have used arbitrum? For each address, for example, the name of the protocols they have used the number of transactions, and more quantitative details about those addresses? Do you expect the dataset to also hold transaction and decoded logs? I think it is an enormous and almost retractable task for a 3 weeks sprint.

epowell101 commented 7 months ago

As mentioned in Discord, while this is a lot of work, it also is a lot of value. This sort of data gathering - and analysis - will result in segments of users that can updated over time. Also this work could lead to similar work on other on chain communities as well, including within specific communities on ARB.