OpenDataforWeb3 / DataGrantsforARB

Organizing work as outlined in the Data Grants proposal to ThankARB and the ARB DAO in Q4 of 2023
Apache License 2.0
3 stars 1 forks source link

Datasets for Sybil/Impact analysis on Arbitrum #2

Closed DistributedDoge closed 6 months ago

DistributedDoge commented 8 months ago

The goal of proposed bounty is to encourage sharing of useful datasets that can aid future Sybil/Grant Analysis. To make end-results easier to compare, I suggest focusing on static datasets that would be assessed according to value they bring to Analyst community.

Datasets produced for this bounty should be well-described so that everyone can understand:

Example ideas, but the more creative the approach is, the better:

Output submitted for judging would be a collection of flat-files (or single-file database like e.g. duckdb) containing information participants managed to gather + description of the dataset.

Bonus points if dataset:

Pfed-prog commented 8 months ago

Output submitted for judging would be a collection of flat-files (or single-file database like e.g. duckdb) containing information participants managed to gather + description of the dataset.

In my humble opinion, ipfs files would also be very great since that would provide a censorship resistant data storage medium

ARDev097 commented 7 months ago

Hi, Thanks for creating this issue. Is it possible to provide some more example ideas? Also, if we are looking for github repositories associated with grants(means grantee, right?). And this might be more of a manual task and a static dataset in a db or google sheet, should that be fine?

DistributedDoge commented 7 months ago

@ARDev097 If you do manual work collecting interesting data and provide static dataset as google sheet, that would be a valid submission here!

As pfed-prog has suggested, if you also package end result as .parquet or .csv and then push it to IPFS somewhere (free tier of https://www.pinata.cloud/ could help) that would be even nicer.

Github data example:

Check oss-observer-blog link to see what kind of analysis is possible having Github data. You don't need to do analysis (different bounty linked below) but you can help collect the data.

If you look around this organization, you would see they already have some interesting datasets, especially is oss-directory and oss-insights repository.

Some of data from oss-directory is hard to consume (tiny .yaml files) so if someone turned that into one big table/spreadsheet with information about all projects in Arbitrum Ecosystem I would find such submission valuable. Just remember our interest is only projects in Arbitrum ecosystem, not broader open-source.

Manual collection example:

Another example well-suited for manual data-gathering is to extend work done by those folks to include all kinds of Arbitrum grants, not just STIP grant they are already tracking:

https://github.com/andrewhong5297/Crypto-Grants-Analysis/blob/main/uploads/evm_grants.csv

Similar bounties:

Tips:

ARDev097 commented 7 months ago

Thanks for the reply above. I started understanding more about oss-directory and oss-insights repository and have checked the structure in .yaml files and currenty I am in the final stage on getting the spreadsheet with information about all projects in Arbitrum Ecosystem.

ARDev097 commented 7 months ago

"Hi, I have worked on this part 'If you look around this organization, you would see they already have some interesting datasets, especially is oss-directory and oss-insights repository. Some of data from oss-directory is hard to consume (tiny .yaml files) so if someone turned that into one big table/spreadsheet with information about all projects in Arbitrum Ecosystem I would find such submission valuable. Just remember our interest is only projects in Arbitrum ecosystem, not broader open-source.' Merged all the .yaml files and have created three different files, All protocols across different ecosystems. - all_ecosystem_data.csv All protocols on Arbitrum ecosystem. - arbitrum_all_protocols_data (Distributed).csv All protocols in Arbitrum with unique row for each protocol. - arbitrum_unique_protocols_data (Unique row for each protocol).csv All three csv file explanation - yaml_task.md

Thank you for the help. Should I submit it on DeWork or over here is fine?"

DistributedDoge commented 7 months ago

@ARDev097 Nice work! Really like column descriptions and putting it all on IPFS.

Please submit on Dework! This helps us keep everything in one place.

We will take a look at all the submissions around 3rd of March!