Open lukaspetrasek opened 6 months ago
Hi, can i jump on this issue
Hi @lukaspetrasek can I work on this
Hi, can you guys please tell me something about you, what skills/experience do you have and how do you plan to tackle this issue? This task is not simple, so I have to learn more information before I assign anyone 🙏🏼
Hi, can you guys please tell me something about you, what skills/experience do you have and how do you plan to tackle this issue? This task is not simple, so I have to learn more information before I assign anyone 🙏🏼
I have worked on something similar to this before the difference was the dataset was stored in a csvfile not on a Google storage.
This project basically involves data visualization for informed decision making.
For this project i will be using python.
google cloud sdk
Okay, assigning you @NueloSE 👍🏼
@NueloSE Let me know if everything is clear. If you have any questions, please ask here. What is you TG handler please? 🙏🏼
Consider joining our TG group. See also our contributor guidelines.
Hi @NueloSE , I assume the PR is ready for review, right?
@lukaspetrasek, i have implemented all requested changes. It is ready for review : a676ecc
@NueloSE has started working on this, but the task is still not completely finished.
What's been done: https://github.com/CarmineOptions/derisk-research/pull/107
I am applying to this issue via OnlyDust platform.
I am a python dev, worked in the field of Data Science and ML. I am a new-comer and I am interested in solving this issue.
I would start by loading the data from Google Storage.I have experienced in Google storage and Jupyter Notebook, I'll load the data in pandas dataframe and analyze it as mentioned. Visualizations can be done by matplotlib, seaborn and dash for interactive dashboards. After carefully analyzing, manipulating and visualizing I'll be able to answer the mentioned questions.
I am applying to this issue via OnlyDust platform.
With a background in data analysis using Python, experience with Google Cloud, and proficiency in Jupyter notebooks, I have worked on projects that involve complex data visualization and user behavior analysis. My expertise with tools like Pandas, Matplotlib, and Seaborn allows me to efficiently analyze, manipulate, and visualize large datasets, making me well-suited for this project.
I would start by loading the data from Google Storage, ensuring the code is flexible to switch between cloud and local databases. I’ll perform an initial exploration of the data, using Pandas to handle the loan data and creating visualizations in Jupyter notebooks. For visualizations, I’ll use Venn diagrams to show user engagement across protocols and dive into token-specific behavior. Additional insights like staked/borrowed capital distribution across tokens and protocols will be highlighted, ensuring the analysis is both thorough and meaningful
I am applying to this issue via OnlyDust platform.
I have experience in Python, data analysis, and blockchain protocols. I’ve worked with datasets in Jupyter notebooks, performing behavior analysis and creating visualizations. My background in DeFi and lending platforms makes me well-suited for this task
I would first create a flexible data loader to handle both Google Storage and local databases. Then, I’d analyze user behavior by visualizing data across protocols and answering key questions with Venn diagrams and token-specific graphs. I’d ensure the code is well-documented and capable of answering additional hypotheses.
I am applying to this issue via OnlyDust platform.
HI , i am a blockchain developer with experience in cario, javascript, typescript, solidity, css, html etc. i am an active contributor here on onlydust . this is my first time contributing to this repo. please assign me ,i am ready to work
i intend to approach the issue by carrying out the following : 1Load Data: I will write a function to load loan data from Google Storage or a local database.
I am applying to this issue via OnlyDust platform.
Background and Leverage: I have experience in building modular and scalable systems that handle data efficiently. I have worked extensively with APIs, databases, and data visualization libraries, allowing me to approach this problem with a solid foundation in both back-end development and data analysis. My background in both front-end and back-end development will enable me to handle the data-loading part flexibly and create meaningful visualizations to answer key questions.
Implementation Plan:
Use Pandas to load the data from Google Storage or the local database (e.g., PostgreSQL). Create a function to switch between the data source dynamically (Google Storage or local DB). Use parquet for loading files from Google Cloud Storage (as provided in the example).import pandas as pd from sqlalchemy import create_engine
def load_data(source="google", protocol="zklend"): if source == "google": url = f"https://storage.googleapis.com/derisk-persistent-state/{protocol}_data/loans.parquet" return pd.read_parquet(url) elif source == "local_db": engine = create_engine('postgresql://username:password@localhost:5432/mydatabase') query = f"SELECT * FROM {protocol}_loans" return pd.read_sql(query, engine)
loans_data = load_data(source="google", protocol="zklend") This structure allows me to switch between loading from Google Storage and a local database with minimal code changes. Data Aggregation for Users Across Lending Protocols Once data is loaded, the next step involves:
Aggregating the data based on users, protocols, and their collateral and debt. Ensuring that the token types (ETH, wBTC, USDC, etc.) are properly parsed and aggregated. Implementation:# Aggregate loan data by user and protocol def aggregate_user_data(data): return data.groupby(["user", "protocol"]).agg({ "collateral_amount": "sum", "debt_amount": "sum" }).reset_index()
aggregated_data = aggregate_user_data(loans_data)
Use the aggregated data to group users by the number of protocols they interact with. This can be visualized with a Venn diagram or bar plot. Implementation:
import matplotlib.pyplot as plt from matplotlib_venn import venn2, venn3
user_protocol_count = aggregated_data.groupby('user').protocol.nunique()
protocol_count_distribution = user_protocol_count.value_counts() protocol_count_distribution.plot(kind="bar", title="Number of Protocols Used by Users") plt.show() B. Venn Diagram for Users Borrowing/Providing Across Protocols To create a Venn diagram:
Identify users who interact with different protocols (e.g., zkLend, another protocol). Use the matplotlib_venn package to visualize overlaps. Implementation:
users_zklend = set(aggregated_data[aggregated_data['protocol'] == 'zklend'].user) users_other_protocol = set(aggregated_data[aggregated_data['protocol'] == 'other'].user)
venn2([users_zklend, users_other_protocol], set_labels=("zkLend", "Other Protocol")) plt.title("Users Borrowing/Providing Across Protocols") plt.show()
high_capital_users = aggregated_data[aggregated_data['collateral_amount'] + aggregated_data['debt_amount'] >= 10000]
high_capital_users.groupby('protocol').agg({ 'collateral_amount': 'sum', 'debt_amount': 'sum' }).plot(kind='bar', stacked=True, title="Capital Distribution Across Protocols") plt.show()
Group data by both token and protocol. Visualize how the capital is distributed for each token across protocols. Implementation:
token_data = aggregated_data.groupby(['token', 'protocol']).agg({ 'collateral_amount': 'sum', 'debt_amount': 'sum' }).reset_index()
token_data.pivot(index='token', columns='protocol', values='collateral_amount').plot(kind='bar', stacked=True, title="Capital by Token Across Protocols") plt.show()
I am applying to this issue via OnlyDust platform.
I am a python dev, i am also working on many blockchain projects,in general i am looking to diversify my portfolio.
Data Loading: Load the Parquet file into a Jupyter notebook using Pandas, ensuring flexibility for local or cloud data sources.
Data Preprocessing: Clean and filter key columns like user ID, protocol, collateral, debt, and tokens.
User Behavior Visualization: Calculate how many users interact with one or multiple protocols. Use bar charts and Venn diagrams to visualize liquidity and borrowing behavior.
Advanced Analysis: Analyze staked/borrowed capital distribution across protocols and visualize it by token type.
Additional Insights: Explore additional metrics like protocol popularity and document findings.
I am a Python developer with experience working on blockchain projects, aiming to broaden my portfolio.
Approach to the Issue
Data Loading: Use Pandas to load the Parquet file into Jupyter for local or cloud analysis. Data Preprocessing: Clean and filter key fields like user ID, protocol, collateral, and tokens. Visualization: Analyze user interaction with protocols using bar charts and Venn diagrams for liquidity and borrowing behavior. Advanced Insights: Examine capital distribution across protocols and visualize token types. This version keeps your plan intact while being more concise. Would you like to refine any part of it further?
this looks awesome tbvh
I am applying to this issue via OnlyDust platform.
Hi, I am backend developer and i'd like to take this task
I am applying to this issue via OnlyDust platform.
Hi, I'm developer with experience in Starknet, I was working closely with blockchain and web3 technologies
"Analyze user behavior across different lending protocols.
Steps: 1) Load the data on loans for all lending protocols from the Google Storage. For example, https://storage.googleapis.com/derisk-persistent-state/zklend_data/loans.parquet is the file with loans for zkLend. It contains the information about the user, protocol, the user's collateral and debt (tokens and amounts). Write the loading part in a way that the source can be easily changed from the Google Storage to a local database. 2) Visualize the behavior of a single user across the lending protocols in a Jupyter notebook. Use the following tokens: ""ETH"", ""wBTC"", ""USDC"", ""DAI"", ""USDT"", ""wstETH"", ""LORDS"", ""STRK"", ""UNO"" and ""ZEND"". You should be able to use the visualizations to answer the following questions:
Definition of Done The code functions well and is documented, the analysis provides meaningful outputs and answers the questions from the setup.