Closed zeeshanakram3 closed 10 months ago
Can we call /state/data-objects once per sync timer and cache it. That will reduce the overhead and could more efficient than head per object.
Can we call /state/data-objects once per sync timer and cache it. That will reduce the overhead and could more efficient than head per object.
Yeah, I guess that should work too, but instead of getting all the data objects per sync (/state/data-objects
), we should only get the data-objects of bags that we need to sync (/state/bags/{bagId}/data-objects
) and then cache it.
Can we call /state/data-objects once per sync timer and cache it. That will reduce the overhead and could more efficient than head per object.
Currently it does cache the result, but for a short period 3min. So if the sync interval is larger than this caching period, which I think the operators are setting to at minimum 10min, the data is always fetched again.
Suggested solution on zoom call:
No need to pre-determine if an operator has object before attempting to fetch it. Just do best effort to fetch from other operators that should be storing the same bag which the object belongs in.
@mnaamani @zeeshanakram3 can we close this after sync rework?
Problem
While syncing the data object/s, the storage node needs to know the existence of required object/s from peer nodes and then pick a URL to download object/s from. However, the problem is that for each asset the node needs to sync, it calls
api/v1/state/data-objects
on all the peer nodes until it picks a URL to download the asset from.https://github.com/Joystream/joystream/blob/46e75506e9639dae4bf67a8ff7e322166ad522ee/storage-node/src/services/sync/tasks.ts#L220-L225
/state/data-objects
does not return a constant size response, and hence response size and latency grow linearly, for reference, currently some nodes return the data objects response over 5MB in size.Solution
HEAD /files/{Id}
to know the availability of assets on a given node, OR