Joystream / atlas

Whitelabel consumer and publisher experience for Joystream
https://www.joystream.org
GNU General Public License v3.0
100 stars 44 forks source link

Deep Problem: Query fragmentation across Orion and Query Node #132

Open bedeho opened 3 years ago

bedeho commented 3 years ago

Background

We currently have two separate sources of truth for social information (view counts, follower counts, follower relationships) and content information, Orion and the Hydra query node.

Problem

With having separate view/follower count servers and query nodes, we cannot do integrated queries based on both kinds of information. For example, one cannot ask for all videos with more than X views in a given content category. This processing has to be done client side. Depending on how bad this constraint becomes, a future deeper fix will be considered.

Solution

This is at least a medium term problem, and we will know about the full e2e architecture at that time, allowing us to find a better solution than we can now.

kdembler commented 3 years ago

Seems to me we may need to solve (or start solving this) rather soon. For the upcoming viewer experience refresh, designs make heavy use of "popular" videos and channels.

What makes most sense to me is to somehow connect query node data with Orion data at some level (outside of client). Both to enable features like "popular" and also to take away the burden of consuming multiple data sources away from Atlas, to potentially simplify that (we're currently doing some graphql client-side stuff that I'm not comfortable with in the long term).

First idea would be to enable Hydra to consume some additional data source outside of the blockchain (Orion). That would enable us to query views field directly from video-related queries. However, I'm not sure if that would completely enable the desired usecase. To enable full "popular" functionality (so to be able to for example find videos that gained a lot of views recently) we would need not only counts but also all the events from Orion, otherwise we don't know when the views happened. That in turn would require some custom logic for popular videos inside the query node which doesn't seem very feasible?

Second thing I think we could do is to create some kind of a "gateway node". As the most basic functionality that node could act as a query node mirror. All the basic requests to it would be forwarded directly to the query node and returned to the client. But that gateway could also stitch data from different sources and provide a uniform schema to the client. What it could also do is to enable the "popular" functionality. Example on how this could work:

  1. Client asks gateway for popular videos
  2. Gateway asks Orion for videos matching "popular" condition, whatever it may be, for example 10 videos with the most views in last 24h
  3. Orion responds with the list of IDs of popular videos (it doesn't know anything else than their IDs)
  4. Gateway, given the list of popular videos, asks query node about details for those videos
  5. Gateway returns full data about popular videos to the client

The second approach seems to be more promising to me. The same gateway node could be used in the future to consume data from even more sources, if needed. I imagine likes/comments functionality could be very well plugged into that same architecture. One drawback I think is possible is having to always keep client/gateway/querynode in sync in regard to the used schema etc, but don't think we will be able to alleviate that.