dataalways / mevboost-data

Public domain Ethereum MEV-Boost winning bid data.
Creative Commons Zero v1.0 Universal
16 stars 2 forks source link
data ethereum mev-boost

MEV-Boost Winning Bid Data

This repository is a collection of public domain Ethereum MEV-Boost winning bid data.

Coverage

Block data is extracted using cryo leveraging the Infura RPC, and merged with data from the following relays:

Data coverage begins on October 10, 2023 at block 18,320,000. We may backfill more data in time.

The data is delivered in Parquet chunks of 10,000 blocks, allowing for incremental bandwidth for users who choose to keep their datasets updated.

Pandas import example

# Validated with python 3.11.6

import os
import pandas as pd  # pandas==2.1.2

base_path = './data/'
file_paths = os.listdir(base_path)
# this assumes that the data directory is in the working directory

dfs = []
for file in file_paths:
    df_tmp = pd.read_parquet(os.path.join(base_path, file))
    dfs.append(df_tmp)

dfs.append(os.path.join(base_path, 'minority-relay-backfill/backfill__minority__relays__blocks__18320000_to_19530000.parquet'))
# add in the backfill data from other relays.

df = pd.concat(dfs)

df = df[df['payload_delivered'] == True]
# drop undelivered payloads

df.sort_values(by=['block_number', 'bid_timestamp_ms'], ascending=True, inplace=True)
# double sorting by block_number and bid_timestamp_ms allows the data to stay 
# ordered inclusive of non-mev blocks missing bid_timestamp_ms data.

df.reset_index(inplace=True, drop=True)

Data Schema