Open hannahhoward opened 2 years ago
Note that we also hope soon for data transfer channel IDs to become UUIDs
related but not neccesarily dependent change -- https://github.com/ipfs/go-graphsync/issues/344
I would note that the disjoint between the planned pause / checkpoint apis proposed here, which are structured in terms of byte offsets in the stream is not an api that a client who's making a graphsync request in terms of a cid or selector is going to be able to interact with - they don't know how big the blocks / sub-dags are going to be exactly in many cases when making such a request.
Will I think for paid retrieval the client kind of has to know how big the thing they're downloading is, otherwise they don't know if they can afford to download it.
Hannah with regards to the proposed interface, I would favour an interface in which the layers don't have to query each other. This makes it easier to manage multiple threads, at the cost of each layer maintaining more state internally.
I'd also suggest a simplification in the protocol such that the provider doesn't ever ask for vouchers. Instead the client sends an updated voucher every time it receives data.
For example: The pricing scheme is a linear price per byte, eg 1 attoFIL / byte, with no up-front (eg unsealing cost). According to the pricing scheme the client and provider both know how much should be paid at each stage of the transfer, so there's no need for the provider to ask for vouchers. The client just sends another voucher each time it receives data. eg:
If at any stage the client doesn't receive data for some timeout, the client resends the voucher for that tranche of data.
In terms of how this is implemented internally:
So the interfaces on data-transfer are something like:
type DataReceiver interface {
Subscribe(func(channelID, totalDataReceived))
SendVoucher(channelID, voucher)
}
type DataSender interface {
Subscribe(func(channelID, voucher))
// SetDataLimit tells the DataSender to keep sending data until it reaches totalData bytes.
SetDataLimit(channelID, totalData)
}
for paid retrieval the client kind of has to know how big the thing they're downloading is, otherwise they don't know if they can afford to download it.
If i was imagining what I would hope for, it would be something like saying 'pause after x blocks', with known constraints on the upper bound of block size.
Will I think for paid retrieval the client kind of has to know how big the thing they're downloading is, otherwise they don't know if they can afford to download it.
There's a bigger meta issue here: we don't have good ways to estimate sizes for retrievals. Our whole protocol is based on a simple pricePerByte and right now I believe almost all of our size calculations are pretty made up based on the assumption of whole DAG. The only real way to calculate size for a partial retrieval is to run the selector.
I just want to raise this for future thinking. It's a meta question with selectors and IPLD. UnixFS has mechanisms for size calculations -- a "sum of sizes of the raw blocks underneath me" is built into the UnixFS data format.
Hannah with regards to the proposed interface, I would favour an interface in which the layers don't have to query each other. This makes it easier to manage multiple threads, at the cost of each layer maintaining more state internally.
One thing I'm genuinely planning of here is that the payment layer lives out of process. This is something that's a genuine likelyhood when we move into IPFS. At the same time, I'm ok with Subscribes as long as they're non blocking -- I guess that pairs fine with websockets. I think your interface matches that.
I like your point that there's no need for a request for payment: you made an agreement, if you don't live up to it you should know why. I guess there's a trade off in being helpful as a retrieval provider ("here's why your transfer is stuck") and making the implementation complicated. The only other bummer is this is a change to the actual retrieval protocol. (the proposed interface I believe would work fine with the current protocol).
But all that said, your proposal looks super simple and perhaps worth it to implement.
I'd make a couple changes to enable to provider to be informative if they want to be:
What do you think?
type DataReceiver interface {
Subscribe(func(channelID, lastVoucherResult, totalDataReceived))
SendVoucher(channelID, voucher)
}
type DataSender interface {
Subscribe(func(channelID, bestVoucher, totalDataSent))
// SetDataLimit tells the DataSender to keep sending data until it reaches totalData bytes.
SendVoucherResult(channelID, voucherResult)
SetDataLimit(channelID, totalData)
}
Also, I'm pretty sure the the subscribe calls effectively just reduce down to the current subscribe.
The only thing that's needs is SendVoucherResult and SetDataLimit.
Also probably a good thing to consider how the first SetDataLimit gets called at the beginning of the request.
One thing I'm genuinely planning of here is that the payment layer lives out of process.
I'm in favour of this idea 👍 It should help enable different payment models (not necessarily voucher-based)
What's the purpose of SendVoucherResult(channelID, voucherResult)
in the DataSender
interface?
Is it so that the data sender can ack that they've received the DataReceiver
's voucher? When the DataSender
sends data it's effectively an implicit ack of the DataSender
's payment.
It also allows the DataSender to give information about requesting payment -- the point here is to maintain protocol level compatibility if one wishes with the old interface -- which I'd like to do, since it's safe to assume both sides for retrieval may end-up at different versions. Arguably, that could become a retrieval protocol upgrade, but if I can avoid it it'd be nice to not worry about all that :)
Goals
The current API for revalidation places a heavy burden on the markets / exchange layer tracking progress on a data transfer and inspecting every block sent for decisions about pausing.
We should make it easier for an exchange layer to only track the state relative to actual payments -- namely, how much has been paid, and given the current state of transfer, how much is owed.
Implementation
The new proposed API is as follows:
This should allow the exchange layer to ignore all state tracking of bytes sent -- it only needs to track how much has been paid for a channel, and do calculations for amount owed based on price. This should allow us to remove all in memory tracking bytes sent / interval / etc in the markets layer