filecoin-project / devgrants

👟 Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
Other
369 stars 307 forks source link

Open Grant Proposal: `Data Onboarding Metrics` #858

Closed Fatman13 closed 1 year ago

Fatman13 commented 2 years ago

Open Grant Proposal: Data Onboarding Metrics

Name of Project: Data Onboarding Metrics - VenusProposal Category: Choose one of core-dev, devtools-librariesProposer: ipfs-force-community(Optional) Technical Sponsor:Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: Yes

Project Description

One of the issues that new SPs or even many veteran SPs facing everyday when they on-board loads of sectors is getting a clear picture of the heartbeat for their storage system to diagnose whatever has gone wrong in their pipeline. A thousand things could go wrong when moving sectors through SP’s storage systems such as chain head out of sync, messages stuck in mpool, missing block producing round, high API latency and etc. SPs have to navigate through these anomalies all the time and be quick to response to these conditions. 
 This is where Data Onboarding Metrics for Venus Filecoin comes into play. We propose to build a series of critical metrics for each component of Venus Filecoin to reflect the live health of a storage system so that operators could have better knowledge of what’s going with their systems and then could better react to different situations instead of relying on guessing, digging through tons of logs or overly extensive dev-ops experience.


Value


There are many benefits we see that Data Onboarding Metrics could bring to SPs to take control of their storage systems back instead of spending a lot time troubleshooting a black box. We believe metrics provides the toolbox for SP to minimize the impact of their operation errors, to get to see if winidowPost messages get properly sent out in time, to monitor time/latency for PoST computation and much more so that SPs do not get punished by the protocol unintentionally. 


Design

This phase includes milestone A and B in the above deliverable table. The team will be collecting ideas from community, concieve the 1st design of metrics system, and lastly build a POC/MVP for miner component. A embedded exporter that allows custom configuration will be included for easier integration with third party tools. A metrics module will be added to the miner project which may contain below parameters for SPs to monitor their storage pipeline.

// latency for GetBaseInfo API
GetBaseInfoDuration   (Milliseconds)
// latency for ComputeTicket API
ComputeTicketDuration (Milliseconds)
// latency for IsRoundWinner API
IsRoundWinnerDuration (Milliseconds)
// latency for ComputeProof API
ComputeProofDuration (Seconds)

// number of block produced
NumberOfBlock (Dimensionless)
// number of rounds that miner_id is winner
NumberOfIsRoundWinner (Dimensionless)

Implementation

This phase includes milestone C to E in the above deliverable table. The team will be continuing to collect ideas from community while implementing the metrics system for the rest of the Venus components. A list of parameters that metrics module will be adopting are listed below…

messager


// Below metrics are updated on a per wallet address granularity 
WalletBalance  (UnitDimensionless)
WalletDBNonce (UnitDimensionless)
WalletChainNonce (Dimensionless)

// Current number of messages that are waiting for venus-messager to fill out parameters like signature, gas usage, nonce etc.
// This metric is updated on a per wallet address granularity 
NumOfUnFillMsg (UnitDimensionless)
// Current number of messages that venus-messager has filled out parameters like signature, gas usage, nonce etc.
// This metric is updated on a per wallet address granularity 
NumOfFillMsg  (Dimensionless)
// Current number of messages that venus-messager has failed to fill out parameters like signature, gas usage, nonce etc.
// This metric is updated on a per wallet address granularity 
NumOfFailedMsg (UnitDimensionless)

// Current number of messages that haven't being on-chain for more than 3 minutes
NumOfMsgBlockedThreeMinutes (Dimensionless)
// Current number of messages that haven't being on-chain for more than 5 minutes
NumOfMsgBlockedFiveMinutes  (UnitDimensionless)

// Number of message being selected by venus-messager during last round of message pushing
SelectedMsgNumOfLastRound (UnitDimensionless)
// Number of message being pushed by venus-messager during last round of message pushing
ToPushMsgNumOfLastRound  (UnitDimensionless)
// Number of message being expired by venus-messager during last round of message pushing
ExpiredMsgNumOfLastRound (UnitDimensionless)
// Number of message encountered errors during last round of message pushing
ErrMsgNumOfLastRound  (UnitDimensionless)

// Current time difference between chain head time and time on venus-messager machine system time
ChainHeadStableDelay  (UnitSeconds)
// Histogram of time difference between chain head time and time on venus-messager machine system time
ChainHeadStableDuration (UnitSeconds)
)

gateway


// Number of wallet connecting to the gateway
WalletCount
// Number of wallet addresses connecting to the gateway
WalletAddressCount
// IP of remote wallet connecting to the gateway
WalletIPAddress

// Number of SP connecting to the gateway
SPCount
// Number of SP addresses connecting to the gateway
SPAddressCount
// IP of remote SP connecting to the gateway
SPIPAddress

// Number of signature gateway initiated
SignCount

market


// Count of storage deals accepted
StorageDealAccepted
// Number of active data transfer 
NumberOfActiveTransfer
// Speed of data transfer, per transfer, unit = Mbps
DataTransferSpeed
// The rate of successful data transfer
SucessTransferRate

daemon

TBD

cluster


// Count of new sectors, per miner_id 
SectorManagerNewSector

// Count of preCommit, per miner_id 
SectorManagerPreCommitSector

// count of commit, per miner_id 
SectorManagerCommitSector

// time of computing winningPost, per miner_id, unit = Seconds
ProverWinningPostDuration

// time of computing WindowPost, per miner_id, unit = Minutes
ProverWindowPostDuration

// Completion rate for partition that have passed windowPost, per miner_id
// Eg: ProverWindowPostCompleteRate=0.9 when 9 out 10 partition complete windowPost submission
ProverWindowPostCompleteRate

// Latency of sector manage API calls, unit = ms
APIRequestDuration

Note that all metrics are not final and subject to have more parameters when community see fit.

Maintenance

This phase includes milestone F to H in the above deliverable table. The team will be continuing to collect ideas and feedbacks from community while iterating on the metrics system for all Venus components. Documentations and easy-to-follow tutorials will be produced to help push metrics system to be adopted by broader community members. We hope after we are done with this phase SPs will have the tools they need to remove any obstacles when on-boarding large amount of sectors.

Total Budget Requested

The total budget requests is $48,000. The breakdown of the budget is associated with the deliverables of each milestone, defined above.


Maintenance and Upgrade Plans

The goal of the team is to support metrics system long term, which including continuously adding more critical parameters that community deemed worthy of monitoring. Therefore, easing the process of on-boarding large amount of data to the network.


Team

Team Members

Force community engineering team

Team Member LinkedIn Profiles

Team Website

https://forcecommunity.io/

Relevant Experience

Force community has been an active contributor to Web3 ecosystem and Filecoin ecosystem in general. The engineering team from Force community has a track record of contributing code to Lotus as far back as Testnet and Space Race. 


Team code repositories

https://github.com/ipfs-force-community

Additional information

Force community is committed to become a major contributor to Web3 infrastructure and we see Filecoin at the core of the big Web3 migration. We hope that we could fast track the realization of Web3 adoption by contributing our software development capacity to the course and join hand in hand with all other ecosystem developers around the globe through this historical journey!

ErinOCon commented 2 years ago

Hi @Fatman13, thank you for your proposal! We are currently reviewing this grant and expect to have more information available next week.

ErinOCon commented 1 year ago

Hi @Fatman13, thank you for your ongoing patience! This grant is still under our review. We will be in touch as soon as we have completed connecting with our ecosystem experts.

ErinOCon commented 1 year ago

HI @Fatman13, can you confirm if the metrics are venus-specific or network-wide? Many thanks!

Fatman13 commented 1 year ago

HI @Fatman13, can you confirm if the metrics are venus-specific or network-wide? Many thanks!

Metrics are specific to Venus and being built into Venus with embedded exporter for a front end to consume.

ErinOCon commented 1 year ago

Thanks, @Fatman13! This grant has been approved. Would you like us to use the contact information on file for this grant?

Fatman13 commented 1 year ago

Great to hear that! The email will be venus@ipfsforce.com. Thank you so much!

ErinOCon commented 1 year ago

Thanks, @Fatman13!