ChainSafe / forest

🌲 Rust Filecoin Node Implementation
https://forest.chainsafe.io
Apache License 2.0
620 stars 149 forks source link

Spike: differentiate snapshot downloads between Forest DEV and others #3399

Closed LesnyRumcajs closed 1 month ago

LesnyRumcajs commented 1 year ago

Issue summary In order to provide meaningful metrics on the Forest snapshots' usage, we need to have metrics that are clear from the noise we are generating ourselves (i.e., hundreds of downloads that are just part of our CI or snapshot generation itself).

For now, we have two options - either having different endpoints for "devs" and "the rest of the world", or using a special User-Agent header for "devs". The latter, suggested by @aatifsyed, is a nice solution, but we need to ensure we can actually read the User-Agent from R2 logs.

Investigate whether it's feasible to enable logging. If it is the case, contrive a small PoC to ensure we can actually do something with those logs, and outline steps necessary to achieve segregated metrics for snapshot downloads. If not, outline the steps necessary the other way around, i.e., with different endpoints.

Other information and links

ruseinov commented 1 year ago

Just my 2 cents: I'd consider downloading from a different endpoint as opposed to dealing with User-Agent. Adding complexity to the process of snap downloads is not the best solution for two reasons:

  1. Humans are lazy, the more burden involved in each step - the worse the performance.
  2. Easy to forget the UA and end up messing up the stats.

That said, reason number 2 also applies to an alternative link.

I might be missing the point here though, if we are mostly talking about CI and other automated tasks - UA solution is great. Though if we introduced a proxy endpoint for "devs" - we'd be able to actually use a database to gather stats as opposed to dealing with log parsing and aggregation.

LesnyRumcajs commented 1 year ago

All in all, I'd expect the usage of whatever backend we choose to be semi-transparent, i.e., a single environmental variable that, if existing, would force either going through a different endpoint or using a special user-agent. Forest devs would be encouraged to set it, though just getting rid of our CI from stats would make them more realistic.

hanabi1224 commented 1 year ago

Which snapshot url do we want to track? The raw CDN link or the DO function that does the redirection? If the latter, we could add telemetry and log user IP, user agent, URL parameter stuff (I guess for free, at least for the telemetry services I've used before) then do the filtering and maybe create a dashboard later

lemmih commented 1 month ago

Infra team will handle this one. Closing.