Joystream / joystream

Joystream Monorepo
http://www.joystream.org
GNU General Public License v3.0
1.43k stars 114 forks source link

Is DataObject Accepted status necessary? #5058

Open mnaamani opened 5 months ago

mnaamani commented 5 months ago

In the runtime a data object is represented as:

pub struct DataObject<RepayableBloatBond> {
    /// Defines whether the data object was accepted by a liason.
    pub accepted: bool,

    /// Bloat bond for storing the data object in the runtime state.
    pub state_bloat_bond: RepayableBloatBond,

    /// Object size in bytes.
    pub size: u64,

    /// Content identifier presented as base-58 encoded multihash.
    pub ipfs_content_id: Base58Multihash,
}

When a data object is created in the runtime the accepted value is false. It can be flipped to true by any worker that is operating a bucket which holds a bag containing the object, with the dispatch storage::accept_pending_data_objects(). This is done by the storage-node once it processes an upload request for an object.

PendingDataObjectsAccepted(StorageBucketId, WorkerId, BagId, BTreeSet<DataObjectId>),

processed by Query Node and Orion, although they only seem to store last worker id that made the dispatch call.

The main consumers of this state are distributor nodes and atlas when deciding whether an object is available in the storage system to even attempt to fetch said object.

It is not clear if this property is adding any real valuable state. Should we continue to use it? Is it contributing to state bloat?

If there are plans to add tooling for the storage lead to penalize operators that have indicated that they accepted an object but cannot produce it on request (if they are still operating a bucket that is obligated to store that object), then we can keep it but the database schemas and mappings in QN and Orion should be update to keep track of all operators. The storage lead also suggested that it might be valuable for a storage node signal in a similar fashion when it has synced an object from another node. This data could help maintain a history the replication status of an object across buckets to help identify where/when objects are lost.

Maybe there is a case for this state to not be necessary on chain but only through Event data?

bedeho commented 5 months ago

It is not clear if this property is adding any real valuable state.

Having someone sign off on the fact that they did in fact get a valid upload of the data and that it matched the hash, that seems quite important. How else does anyone even determine if the upload was ever completed or even initiated? This data can of course be put in some central off-chain location.

The real issue with the storage pallet is that really it does not need to be native runtime code, as long as we are not doing any actual automated on-chain slashing or rewarding based on some proof of storage scheme or something. It could all just be metaprotocol stuff, which would be much more flexible, dramatically reduce fees and avoids the whole bloat issue. But this is a big change. See here: https://github.com/Joystream/joystream/issues/4940