filecoin-project / FIPs

The Filecoin Improvement Proposal repository
313 stars 165 forks source link

FIP-0083: Add additional event properties to ease DDO transition #964

Closed rvagg closed 7 months ago

rvagg commented 7 months ago

The current proposal here, after discussion, negotiation and splitting out the slightly more difficult pieces comes down to the following, which is all focused on augmenting existing verified registry events with easily accessible information that can be used to match up to sector activations and/or market state data.

Click to expand original post This is a revisit of #955, a proposal to add additional information to built-in actor events, but adds more information across more events than #955 did. That PR was discussed further in this thread: https://github.com/filecoin-project/FIPs/discussions/754#discussioncomment-8698293, where it was pointed out regarding the costs of these events: > On thing that is new since the beginning of this discussion is that we have quantified the gas cost. Built-in actor methods tend to be doing a lot, so the cost of events is very small as a proportion. Perhaps the minimalist approach isn't right here, and we should just include all the near-to-hand information with all events? After receiving additional feedback on the information currently available in events, including from a raised awareness of the implications of DDO in network version 22 on observability concerns, we're coming back with **this new proposal that we're hoping to get approved and shipped _with_ network version 22**. Given the tight time constraints and concerns about stability, we've attempted to craft a line between "near-to-hand" information (i.e. low-risk, minimal code changes) and "maximally useful" when considering the requirements we are hearing from parties concerned about a DDO world. A lot of the feedback we have received is related to the verified registry and the difficulties in tracking claims, pieces, SPs and clients and their relationships. The events, as they are currently structured, require reactive state inspection to provide enough context to make sense of them, such as the example of claims outlined #955. The diff here also includes some formatting changes, so it's a bit noisier than just the pure event changes, sorry. The changes proposed here are summarised as follows: **Verified registry events** * `verifier-balance` * ~Add `"prev-balance"` (zero for `AddVerifier`, whereas `"balance"` is zero for `RemoveVerifier`)~ * Add `"client"` (ID) as optional (only used for `AddVerifiedClient`) * `allocation` * Add `"piece-cid"` and `"piece-size"` * Add `"term-max"` and `"term-min"` * Add `"expiration"` * `allocation-removed` * Add `"piece-cid"` and `"piece-size"` * Add `"term-max"` and `"term-min"` * Add `"expiration"` * `claim` * Add `"piece-cid"` and `"piece-size"` * Add `"sector"` (ID) * Add `"term-max"` and `"term-min"` * ~Add `"expiration"`~ * `claim-updated` * Add `"piece-cid"` and `"piece-size"` * Add `"sector"` (ID) * Add `"term-max"` and `"term-min"` * ~Add `"prev-term-max"`~ * ~Add `"expiration"`~ * `claim-removed` * Add `"piece-cid"` and `"piece-size"` * Add `"sector"` (ID) * Add `"term-max"` and `"term-min"` * ~Add `"expiration"`~ **Market actor events** No changes, these are complicated because there's so much information possible to include (e.g. all `DealProposal` could be included on all of these events) and can be revisited in a future FIP according to feedback. Hopefully many existing workflows will be sufficient where the market actor is involved. **Miner actor events** * `sector-activated` * ~Add `"expiration-epoch"`~ * ~Add `"claim"` (ID) to couple with each `"piece-cid"` & `"piece-size"` pair where a claim is made~ * `sector-updated` * ~Add `"expiration-epoch"`~ * ~Add `"claim"` (ID) to couple with each `"piece-cid"` & `"piece-size"` pair where a claim is made~

Discussion: https://github.com/filecoin-project/FIPs/discussions/754#discussioncomment-8823782

jennijuju commented 7 months ago

Peer: other than hearing the feedback from various ecosystem partners, i also felt the pain points myself when working on integration docs, for getting simple data 📊 like “how much data is stored in filecoin”

I believe these additional events fields are necessary for toolings like starboard dashboards, http://filplus.d.interplanetary.one/, filecoin.tool, https://filecoin-tracker.com/ and so on to build systems that can continue providing CORRECT and more real-time visibility of the network: how much data are onboarded and currently stored; what data is stored; how long the data is stored.

To only make the minimal change here is needed to not delay the network upgrade, whether built in actors should include more info or not and a more complete tradeoff analysis via another FIP.

One more piece I think is missing, is a visible event that can make f05 deal onboarded via ProveCommitSectors observable. Today the event is triggered in cron which is not observable from the clients atm, (1) message receipts don't include implicit messages (2) internal traces only has call traces, and dont Enable events viability in cron is a much bigger change that shouldn’t happen in this late of development phase. Without this information, tooling developers will need to add event indexing while keeping the current heavy state inspection, and dedup entries that are captured twice. I wonder, is there anywhere else, outside of the cron we can safely trigger this event? 

aarshkshah1992 commented 7 months ago

@rvagg There is some data wrangling we need to do to get the claim info available in all the Sector events. I'd like to avoid that.

The Claim event in the VerifReg Actor now already has the (miner ID, sector number, piece cid, piece size, claim ID) info on it and that will be sufficient for clients to track allocation claims by SPs and also correlate them with the sectors/pieces being claimed.

aarshkshah1992 commented 7 months ago

Actually, we already have the provider in the claim event. Given that we now have (provider ID, sector number, piece cid and piece size) in the Claim event, I don't think we need to add the claim ID to any of the Miner Actor events.

aarshkshah1992 commented 7 months ago

@jennijuju Unfortunately, that is tricky because the proofs submitted via 'ProveCommitSectors' are validated in a cron job and unless those proofs are validated, we don't know if the deals have been successfully "onboarded"/"activated". So we can't just emit this info/event when the SP calls 'ProveCommitSectors'. We can only emit this after those proofs have been validated but the proofs are validated in a cron job.

jennijuju commented 7 months ago

Actually, we already have the provider in the claim event. Given that we now have (provider ID, sector number, piece cid and piece size) in the Claim event, I don't think we need to add the claim ID to any of the Miner Actor events.

Hrm you get the claim ID right? I think the expiration here meant sector expiration not claim expiration, and sector expiration is already in the sector info and shouldn’t be hard to get.

jennijuju commented 7 months ago

@jennijuju Unfortunately, that is tricky because the proofs submitted via 'ProveCommitSectors' are validated in a cron job and unless those proofs are validated, we don't know if the deals have been successfully "onboarded"/"activated". So we can't just emit this info/event when the SP calls 'ProveCommitSectors'. We can only emit this after those proofs have been validated but the proofs are validated in a cron job.

If we add notification actor ID to sector activated event that may solve the problem as well.

aarshkshah1992 commented 7 months ago

@jennijuju @rvagg Just to be clear

  1. The allocation events have the expiry of the allocation
  2. The claim events have the expiry of the claim i..e.
impl Expires for Claim {
    fn expiration(&self) -> ChainEpoch {
        self.term_start + self.term_max
    }
}

This is the epoch after which the SP stops getting QA power for the claimed sector

  1. The sector activation and sector update events in the Miner Actor have the expiry of the sector
Stebalien commented 7 months ago

If we add notification actor ID to sector activated event that may solve the problem as well.

Summarizing my objections from the call:

I'm going to object to this pretty strongly. We're asking the user to infer that, because the market actor was notified, the deal necessarily "belongs" to the market actor. The SP can choose to notify any actor and and any number of actors and who they notify won't necessarily be the "storage market" that owns the deal (and I really don't want users to start assuming this). E.g., the SP could choose to notify f05 even if the deal isn't related to the market.

Proposal from the call:

  1. Add the piece ID, size, and sector ID to the deal activation event.
  2. Each time a piece CID + size tuple is mentioned in a deal activation event, ignore one copy of the same piece CID + size from the equivalent sector activation event. These should always happen in the same message, so deduplicating these shouldn't be too difficult.

NOTE: It doesn't matter which piece we ignore from the sector activation event, we don't have IDs anyways.

Stebalien commented 7 months ago

Proposal from a followup conversation with @aarshkshah1992:

We don't actually need to add anything to the deal activation event. Instead, we can:

  1. Record the activated pieces.
  2. Record the activated deals (we don't need to associate them with the pieces).
  3. Then, when we diff the market state, we can check the database and see that we've already accounted for some of the deals (step 2) and avoid double-counting those pieces.

This will let us get away with adding nothing to our events.

jennijuju commented 7 months ago

@Stebalien, like I said, I won't insist on this if there is a solution for the builder that's simple enough to build. BUT last bikeshedding.

The original comment called it the notification actor ID, not the storage market ID, for a reason. So, I am not convinced by your "I read builders mind" statement here. A more productive thing would be defining and aligning standards with the builders/ecosystem.

I am also unconvinced that "the SP could choose to notify f05 even if the deal isn't related to the market." What is the incentive/use case for any client/SP to ever DO that? Who would anyone send any notification if they don't want to have their data deal to participate in some storage market/marketplace or leverage some on-chain payment mechanism? It would be great if you could follow up on #fip-direct-data-onboarding cuz I'm now concerned about why we are not aligned on why we should have this notification design in the protocol and why it is useful.

jennijuju commented 7 months ago

Then, when we diff the market state, we can check the database and see that we've already accounted for some of the deals (step 2) and avoid double-counting those pieces.

@aarshkshah1992 @Stebalien, what's your plan to check this? Where do you get the pieces from and what information other than piece CID will you be using, and the key pairs for checking in the deal DB?

anorth commented 7 months ago

A reminder to you all that most of this discussion belongs in the discussion thread https://github.com/filecoin-project/FIPs/discussions/754, where it has the context of prior discussion, can support a level of threading, and will persist and be discoverable after this PR is closed.

aarshkshah1992 commented 7 months ago

If we add notification actor ID to sector activated event that may solve the problem as well.

Summarizing my objections from the call:

I'm going to object to this pretty strongly. We're asking the user to infer that, because the market actor was notified, the deal necessarily "belongs" to the market actor. The SP can choose to notify any actor and and any number of actors and who they notify won't necessarily be the "storage market" that owns the deal (and I really don't want users to start assuming this). E.g., the SP could choose to notify f05 even if the deal isn't related to the market.

Proposal from the call:

  1. Add the piece ID, size, and sector ID to the deal activation event.
  2. Each time a piece CID + size tuple is mentioned in a deal activation event, ignore one copy of the same piece CID + size from the equivalent sector activation event. These should always happen in the same message, so deduplicating these shouldn't be too difficult.

NOTE: It doesn't matter which piece we ignore from the sector activation event, we don't have IDs anyways.

@Stebalien @anorth @rvagg @jennijuju This sounds like the most promising solution. The only problem here is that we still miss out on data in sectors onboarded via the legacy ProveCommitSectors method as the corresponding deal activation events will be emitted in cron and will not be available to clients. If we're okay with losing just that bit of de-duping, this solution works.

rvagg commented 7 months ago

btw I pushed this change:

rvagg commented 7 months ago
anorth commented 7 months ago

This looks good to me at the moment with the exception of the verifier-balance event discussed here.

Note I don't object to adding more simple fields (piece id, size, sector) to market events. I don't think we should attempt to do anything special for the unobservable events from legacy prove commit sector though.

Stebalien commented 7 months ago

@jennijuju

The original comment called it the notification actor ID, not the storage market ID, for a reason. So, I am not convinced by your "I read builders mind" statement here. A more productive thing would be defining and aligning standards with the builders/ecosystem.

...

I am also unconvinced that "the SP could choose to notify f05 even if the deal isn't related to the market." What is the incentive/use case for any client/SP to ever DO that?

Predicting how an API might be misused is an important part of API design (and UX, in general). In this case, my concern is that users may try to use this information the same way you are, but with more severe consequences if something goes wrong (and therefore more incentive for SPs to send bad notifications).

@aarshkshah1992 @Stebalien, what's your plan to check this? Where do you get the pieces from and what information other than piece CID will you be using, and the key pairs for checking in the deal DB?

We don't need the piece CIDs for deduplication, just the deal IDs from the deal activation events (and later, the deal IDs in the market state).

The key difference between our two proposals is which information we keep:

This actually means less work in the future. When we drop support for the old ProveCommit method, indexers can simply delete the market diffing code because the event processing code will have all the information.

(and yeah, we should have had this discussion in a threaded forum)

Stebalien commented 7 months ago

@aarshkshah1992

Every solution we've discussed involves diffing the market state to detect deals from ProveCommit. The goal here is to avoid double counting.

I'd actually prefer the last solution I proposed for the reason I stated in my previous comment.

rvagg commented 7 months ago

Removed "client" from verifier-balance, will move this to a separate PR because the discussion is of a different nature and I don't want that to block these less controversial changes.

Updated original post to detail just the changes here now.

jennijuju commented 7 months ago

I don't think we should attempt to do anything special for the unobservable events from legacy prove commit sector though.

@anorth @Stebalien 👍 I hear you and I agree we have other alternatives thats better.

jennijuju commented 7 months ago

Editor: thanks all for the user feedback and peer reviews! These changes LGTM, other than the verifreg expiration fields which I believe was suggested to be replaced. If that's changed, @rvagg @aarshkshah1992 can you link us the latest thread where alignment was made?

rvagg commented 7 months ago

other than the verifreg expiration fields which I believe was suggested to be replaced

Discussion inline at: https://github.com/filecoin-project/FIPs/pull/964#discussion_r1529337442

"expiration" is left on the allocation* events because an allocation has an explicit expiration. The expiration for a claim is term_start + term_max where term_start is the height at which the event is emitted so it seems like a duplicate piece of information. So there is no "expiration" on the claim* events. Happy to add term_start if someone thinks that's a value-add though, it's simple enough but feels duplicative to me.

anorth commented 7 months ago

I'm happy to add term_start. It's not duplicative for an update event (and would require maintaining history to calculate)

rvagg commented 7 months ago

Added "term-start" to all claim* events