davidgasquez / filecoin-data-portal

🧮 Open and local-first data hub for Filecoin!
https://filecoindataportal.xyz
MIT License
26 stars 8 forks source link

Add sector level metrics #73

Closed davidgasquez closed 3 weeks ago

davidgasquez commented 3 months ago

The goal metric is to display the cumulative or daily number of sectors that are CC only (never snapped) over time.

Let's start with sectors added, sectors snapped, ...

davidgasquez commented 2 months ago

Another one from Adrian Lanzafame: Display the number of expiring sectors per day per type (Commited Capacity with no deals, non verified deals, Verified deals)

davidgasquez commented 2 months ago

These requests are blocked by a data issues I found on Lily's datasets.

Queries

Unique sectors in the network.

with all_sectors as (
  select height, activation_epoch, sealed_cid, verified_deal_weight, deal_weight, concat(sector_id, miner_id) as ms from `lily-data.lily.miner_sector_infos_v7`
  union all
  select height, activation_epoch, sealed_cid, verified_deal_weight, deal_weight, concat(sector_id, miner_id) as ms from `lily-data.lily.miner_sector_infos`
)

select
  approx_count_distinct(ms)
from all_sectors

Number of sector snaps.

select
  approx_count_distinct(concat(sector_id, miner_id)) as unique_sectors_snapped
from `lily-data.lily.miner_sector_events`
where event = 'SECTOR_SNAPPED'
davidgasquez commented 4 weeks ago

At the Storage Provider daily level, we should surface the number of sectors onboarded, offboarded and the related power.

Also, a great base table to get the latest state for filecoin_sectors is this:

SELECT
  height,
  miner_id,
  sector_id,
  sealed_cid,
  activation_epoch,
  expiration_epoch,
  deal_weight,
  verified_deal_weight
FROM
  `lily-data.lily.miner_sector_infos_v7` -- join with the older one

We'll need to also join to the latest event from miner_sector_events to know the real state. I.e: the latest event could be SECTOR_FAULTED. Perhaps we need a few extra columns; latest_fault_at, latest_recovery_at...

davidgasquez commented 4 weeks ago

There are more sectors in miner_sector_events than sectors in miner_infos.

with all_sectors as (
  select height, activation_epoch, sealed_cid, verified_deal_weight, deal_weight, concat(sector_id, miner_id) as ms from `lily-data.lily.miner_sector_infos_v7`
  union all
  select height, activation_epoch, sealed_cid, verified_deal_weight, deal_weight, concat(sector_id, miner_id) as ms from `lily-data.lily.miner_sector_infos`
)

select
  approx_count_distinct(ms)
from all_sectors

Returns 637530879.

select
  approx_count_distinct(concat(sector_id, miner_id)) as ms
from `lily-data.lily.miner_sector_events`

Returns 643394519.

davidgasquez commented 4 weeks ago

Also, for this bigger table, it might be interesting to replicate it incrementally and reduce the overall size of the table applying a few tricks.

select
  height,
  cast(REGEXP_REPLACE(miner_id, 'f0', '') as int) as provider_id,
  cast(sector_id as int) as sector_id,
  case event
    when 'SECTOR_ADDED' then 1
    when 'SECTOR_FAULTED' then 2
    when ...
  end as event
from
  `lily-data.lily.miner_sector_events`
order by height desc, provider_id desc, event desc