Joystream / orion

Atlas backend
GNU General Public License v3.0
7 stars 15 forks source link

Idea: Smart DP selection #106

Open traumschule opened 1 year ago

traumschule commented 1 year ago

Rationale

This is to enhance work on provider selection done in Joystream/atlas#2748 and Joystream/atlas#2909.

Atlas tries to determine user location and knows where each provider is located. However in generateAssetUrl it doesn't seem to take those into account:

  const assetIdBn = new BN(asset.id)
  const endpointsCountBn = new BN(distributorsEndpoints.length)
  const distributorIndex = assetIdBn.mod(endpointsCountBn).toNumber()
  const endpoint = distributorsEndpoints[distributorIndex]

Because of the amount of queries atlas makes to fetch dozens of smaller assets there is plenty of opportunity to learn which providers respond fastest or not at all to be excluded and logged (#3241). This knowledge will inform further decisions to retrieve bigger assets.

This enables (the DAO) to learn latency and bandwidth per provider based on user location.

Scope

  1. Add a global state (client context) to learn about network performance while fetching assets.
  2. Every request to a DP informs a latency array per provider.
  3. At first (without performance data) distribute load between all providers to learn how they are doing. Later round robin smaller requests between best five. If any slows down its rank sinks until immediately another is picked for further queries.
  4. When fetching costly assets (file size >1mb) try to fetch from three closest providers in parallel and pick first responder (pseudo-anycast).
  5. Advanced: Video assets can be requested from multiple providers with different offsets to better use local bandwidth and work around spontaneous routing issues (when current best pick gets overloaded).
  6. Knowledge from one session can be cached in the store for optimized experience on next visit.

Note: Above model is purely experience based and some errors by far providers are expected to exclude those in further requests. Distance based selection via coordinates ought to be best choice and should be taken into account for the initial sort but DNS sometimes has surprises.

Relevant code

bedeho commented 1 year ago

Keep in mind that as of Atlas v2, it no longer does validator selection I believe, Orion does it for it, and Orion is in a good position to add much more intelligence to this process, as it has more information across users and runs long-running infra.

I am sure this selection logic in Orion can be mode more elaborate, but at this point I think it would be a distraction to focus on this rather than core issues in content delivery infra. Trying to have elaborate logic at the application level to evade failing nodes in not the most direct way of dealing with this, and it means lots of other apps and tools which don't have this logic would be left performing very poorly.