CarmineOptions / derisk-research

MIT License
9 stars 73 forks source link

ODHack: Analyze user behavior across different lending protocols #100

Open lukaspetrasek opened 6 months ago

lukaspetrasek commented 6 months ago

"Analyze user behavior across different lending protocols.

Steps: 1) Load the data on loans for all lending protocols from the Google Storage. For example, https://storage.googleapis.com/derisk-persistent-state/zklend_data/loans.parquet is the file with loans for zkLend. It contains the information about the user, protocol, the user's collateral and debt (tokens and amounts). Write the loading part in a way that the source can be easily changed from the Google Storage to a local database. 2) Visualize the behavior of a single user across the lending protocols in a Jupyter notebook. Use the following tokens: ""ETH"", ""wBTC"", ""USDC"", ""DAI"", ""USDT"", ""wstETH"", ""LORDS"", ""STRK"", ""UNO"" and ""ZEND"". You should be able to use the visualizations to answer the following questions:

Definition of Done The code functions well and is documented, the analysis provides meaningful outputs and answers the questions from the setup.

vibenedict commented 6 months ago

Hi, can i jump on this issue

NueloSE commented 6 months ago

Hi @lukaspetrasek can I work on this

lukaspetrasek commented 6 months ago

Hi, can you guys please tell me something about you, what skills/experience do you have and how do you plan to tackle this issue? This task is not simple, so I have to learn more information before I assign anyone 🙏🏼

NueloSE commented 6 months ago

Hi, can you guys please tell me something about you, what skills/experience do you have and how do you plan to tackle this issue? This task is not simple, so I have to learn more information before I assign anyone 🙏🏼

I have worked on something similar to this before the difference was the dataset was stored in a csvfile not on a Google storage.

This project basically involves data visualization for informed decision making.

For this project i will be using python.

Steps to tackle task

  1. Install required libraries like pandas, matplotlib, seaborn etc
  2. Load the data from google storage using the google cloud sdk
  3. Analyze and visualize user behavior
lukaspetrasek commented 6 months ago

Okay, assigning you @NueloSE 👍🏼

@NueloSE Let me know if everything is clear. If you have any questions, please ask here. What is you TG handler please? 🙏🏼

Consider joining our TG group. See also our contributor guidelines.

lukaspetrasek commented 6 months ago

Hi @NueloSE , I assume the PR is ready for review, right?

NueloSE commented 6 months ago

@lukaspetrasek, i have implemented all requested changes. It is ready for review : a676ecc

lukaspetrasek commented 2 months ago

@NueloSE has started working on this, but the task is still not completely finished.

What's been done: https://github.com/CarmineOptions/derisk-research/pull/107

tosoham commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

I am a python dev, worked in the field of Data Science and ML. I am a new-comer and I am interested in solving this issue.

How I will approach this issue?

I would start by loading the data from Google Storage.I have experienced in Google storage and Jupyter Notebook, I'll load the data in pandas dataframe and analyze it as mentioned. Visualizations can be done by matplotlib, seaborn and dash for interactive dashboards. After carefully analyzing, manipulating and visualizing I'll be able to answer the mentioned questions.

gregemax commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

With a background in data analysis using Python, experience with Google Cloud, and proficiency in Jupyter notebooks, I have worked on projects that involve complex data visualization and user behavior analysis. My expertise with tools like Pandas, Matplotlib, and Seaborn allows me to efficiently analyze, manipulate, and visualize large datasets, making me well-suited for this project.

How I plan on tackling this issue

I would start by loading the data from Google Storage, ensuring the code is flexible to switch between cloud and local databases. I’ll perform an initial exploration of the data, using Pandas to handle the loan data and creating visualizations in Jupyter notebooks. For visualizations, I’ll use Venn diagrams to show user engagement across protocols and dive into token-specific behavior. Additional insights like staked/borrowed capital distribution across tokens and protocols will be highlighted, ensuring the analysis is both thorough and meaningful

Luluameh commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

I have experience in Python, data analysis, and blockchain protocols. I’ve worked with datasets in Jupyter notebooks, performing behavior analysis and creating visualizations. My background in DeFi and lending platforms makes me well-suited for this task

How I plan on tackling this issue

I would first create a flexible data loader to handle both Google Storage and local databases. Then, I’d analyze user behavior by visualizing data across protocols and answering key questions with Venn diagrams and token-specific graphs. I’d ensure the code is well-documented and capable of answering additional hypotheses.

ShantelPeters commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

HI , i am a blockchain developer with experience in cario, javascript, typescript, solidity, css, html etc. i am an active contributor here on onlydust . this is my first time contributing to this repo. please assign me ,i am ready to work

How I plan on tackling this issue

i intend to approach the issue by carrying out the following : 1Load Data: I will write a function to load loan data from Google Storage or a local database.

  1. Visualize Behavior: Show the how users interact with 1 or more lending protocols and tokens.
  2. Venn Diagram: Create a Venn diagram to display users’ protocol participation.
  3. Capital Distribution: Visualize capital distribution across protocols and tokens.
  4. Document: I will ensure the code is clear, functional, and well-documented.
vic-Gray commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

Background and Leverage: I have experience in building modular and scalable systems that handle data efficiently. I have worked extensively with APIs, databases, and data visualization libraries, allowing me to approach this problem with a solid foundation in both back-end development and data analysis. My background in both front-end and back-end development will enable me to handle the data-loading part flexibly and create meaningful visualizations to answer key questions.

How I plan on tackling this issue

  1. Loading Data from Google Storage or a Local Database I will design the data loading functionality in a way that it can be easily switched between loading data from Google Cloud Storage and a local database. This can be done using a modular function that abstracts the data source.

Implementation Plan:

Use Pandas to load the data from Google Storage or the local database (e.g., PostgreSQL). Create a function to switch between the data source dynamically (Google Storage or local DB). Use parquet for loading files from Google Cloud Storage (as provided in the example).import pandas as pd from sqlalchemy import create_engine

def load_data(source="google", protocol="zklend"): if source == "google": url = f"https://storage.googleapis.com/derisk-persistent-state/{protocol}_data/loans.parquet" return pd.read_parquet(url) elif source == "local_db": engine = create_engine('postgresql://username:password@localhost:5432/mydatabase') query = f"SELECT * FROM {protocol}_loans" return pd.read_sql(query, engine)

Load zkLend data

loans_data = load_data(source="google", protocol="zklend") This structure allows me to switch between loading from Google Storage and a local database with minimal code changes. Data Aggregation for Users Across Lending Protocols Once data is loaded, the next step involves:

Aggregating the data based on users, protocols, and their collateral and debt. Ensuring that the token types (ETH, wBTC, USDC, etc.) are properly parsed and aggregated. Implementation:# Aggregate loan data by user and protocol def aggregate_user_data(data): return data.groupby(["user", "protocol"]).agg({ "collateral_amount": "sum", "debt_amount": "sum" }).reset_index()

Example: Aggregate zkLend data

aggregated_data = aggregate_user_data(loans_data)

  1. Visualize the Behavior of Users Across Lending Protocols A. Number of Protocols Used by Users To visualize the number of users who use 1 protocol, 2 protocols, or more:

Use the aggregated data to group users by the number of protocols they interact with. This can be visualized with a Venn diagram or bar plot. Implementation:

import matplotlib.pyplot as plt from matplotlib_venn import venn2, venn3

Count users by number of protocols

user_protocol_count = aggregated_data.groupby('user').protocol.nunique()

Visualize the number of protocols used by users

protocol_count_distribution = user_protocol_count.value_counts() protocol_count_distribution.plot(kind="bar", title="Number of Protocols Used by Users") plt.show() B. Venn Diagram for Users Borrowing/Providing Across Protocols To create a Venn diagram:

Identify users who interact with different protocols (e.g., zkLend, another protocol). Use the matplotlib_venn package to visualize overlaps. Implementation:

For simplicity, assume we have user sets for zkLend and another protocol

users_zklend = set(aggregated_data[aggregated_data['protocol'] == 'zklend'].user) users_other_protocol = set(aggregated_data[aggregated_data['protocol'] == 'other'].user)

Create a Venn diagram

venn2([users_zklend, users_other_protocol], set_labels=("zkLend", "Other Protocol")) plt.title("Users Borrowing/Providing Across Protocols") plt.show()

  1. Capital Distribution Across Protocols This analysis involves looking at the total amount of capital (collateral and debt) distributed across the protocols for each user. You can visualize the distribution of capital across lending protocols in a bar plot or pie chart, adjusted for capital threshold. Implementation:

Filter users with at least $10k USD worth of capital (collateral + debt)

high_capital_users = aggregated_data[aggregated_data['collateral_amount'] + aggregated_data['debt_amount'] >= 10000]

Visualize capital distribution

high_capital_users.groupby('protocol').agg({ 'collateral_amount': 'sum', 'debt_amount': 'sum' }).plot(kind='bar', stacked=True, title="Capital Distribution Across Protocols") plt.show()

  1. Token-Specific Analysis To break down the data on a per token basis (e.g., ETH, wBTC, USDC):

Group data by both token and protocol. Visualize how the capital is distributed for each token across protocols. Implementation:

Group by token and protocol

token_data = aggregated_data.groupby(['token', 'protocol']).agg({ 'collateral_amount': 'sum', 'debt_amount': 'sum' }).reset_index()

Visualize capital distribution per token across protocols

token_data.pivot(index='token', columns='protocol', values='collateral_amount').plot(kind='bar', stacked=True, title="Capital by Token Across Protocols") plt.show()

bruhhgnik commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

I am a python dev, i am also working on many blockchain projects,in general i am looking to diversify my portfolio.

How I plan on tackling this issue

Data Loading: Load the Parquet file into a Jupyter notebook using Pandas, ensuring flexibility for local or cloud data sources.

Data Preprocessing: Clean and filter key columns like user ID, protocol, collateral, debt, and tokens.

User Behavior Visualization: Calculate how many users interact with one or multiple protocols. Use bar charts and Venn diagrams to visualize liquidity and borrowing behavior.

Advanced Analysis: Analyze staked/borrowed capital distribution across protocols and visualize it by token type.

Additional Insights: Explore additional metrics like protocol popularity and document findings.

vic-Gray commented 2 months ago

I am a Python developer with experience working on blockchain projects, aiming to broaden my portfolio.

Approach to the Issue

Data Loading: Use Pandas to load the Parquet file into Jupyter for local or cloud analysis. Data Preprocessing: Clean and filter key fields like user ID, protocol, collateral, and tokens. Visualization: Analyze user interaction with protocols using bar charts and Venn diagrams for liquidity and borrowing behavior. Advanced Insights: Examine capital distribution across protocols and visualize token types. This version keeps your plan intact while being more concise. Would you like to refine any part of it further?

Ndifreke000 commented 2 months ago

this looks awesome tbvh

UzNaZ commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

Hi, I am backend developer and i'd like to take this task

AndriiBogomolov commented 2 months ago

I am applying to this issue via OnlyDust platform.

My background and how it can be leveraged

Hi, I'm developer with experience in Starknet, I was working closely with blockchain and web3 technologies