Will’s team ran a measurement study grabbing the number of blocks at the big storage providers. Nodes can have very few blocks upto millions of blocks for the bigger providers, so we can have different CPIR schemes to scale well.
In their study, a provider advertisement for a CID is taken as a node hosting a CID rather than actually retrieving the CID.
1) Distribution of number of blocks stored by nodes: They capture the number of provider advertisements for CIDs advertised by top 100 nodes. We can get a range for the size of the datastore stored by these nodes by multiplying by 256kB (default block size). If we want the long tail (on a logarithmic x axis), Will said we can scrape the stats under indexCount.
2) Churn rate of CIDs: They show that upto half of the advertised CIDs are dropped the next day. (By default they are kept only for a day.)
[ ] Measure number of blocks per file, on average and at maximum, stored at the top 1, 10, 100, 1000 ... providers. This would be helpful to evaluate the overhead of running the PIR or batched PIR scheme as many times as the maximum number of blocks per file.
Followup discussion with Dennis: They have stats on AWS, from traffic on the IPFS network, on the provider advertisements. We'd still need to actually query for the blocks in order to determine the number of blocks in a file (and presumably use the ipfs-gocar repo to read the root CID and determine the number of blocks). TODO: Discuss with Dennis over Slack on how to get access to the stats.
Will’s team ran a measurement study grabbing the number of blocks at the big storage providers. Nodes can have very few blocks upto millions of blocks for the bigger providers, so we can have different CPIR schemes to scale well.
In their study, a provider advertisement for a CID is taken as a node hosting a CID rather than actually retrieving the CID. 1) Distribution of number of blocks stored by nodes: They capture the number of provider advertisements for CIDs advertised by top 100 nodes. We can get a range for the size of the datastore stored by these nodes by multiplying by 256kB (default block size). If we want the long tail (on a logarithmic x axis), Will said we can scrape the stats under indexCount. 2) Churn rate of CIDs: They show that upto half of the advertised CIDs are dropped the next day. (By default they are kept only for a day.)
Followup discussion with Dennis: They have stats on AWS, from traffic on the IPFS network, on the provider advertisements. We'd still need to actually query for the blocks in order to determine the number of blocks in a file (and presumably use the ipfs-gocar repo to read the root CID and determine the number of blocks). TODO: Discuss with Dennis over Slack on how to get access to the stats.