Open brendalee opened 2 years ago
Do we need all child content advertised (even individual leaf nodes) or do we want things advertised at the file level? And what do we think is a good UX here? should it be a flag that gets set per blob they upload? I assume they are pinning directories, which we could crawl through and enumerate all the files in in order to advertise those.
Given that this was created for a particular use case, I would say we would want things advertised at the file level. They are using the ipfs pin remote add to pin directories. I cannot answer the flag that gets set per blob", what we are trying to achieve is that a user can list/access a file within a directory, and could track the child CID within estuary (whether it is pinned, and whether it has a deal on filecoin).
Would advertising only files break some retrievals though? Based on how retrievals in IPFS work today, seems like if we don't advertise all child content, there can be cases when someone has already retrieved part of the file, but when trying to get the next "piece" of the file will need the specific child CID otherwise isn't smart enough to traverse back up to figure out which file it was?
It would be great, if Estuary advertised all child CIDs. Now, it doesn't behave as normal IPFS node as expected, which is confusing.
@stastnypremysl what is your usecase for this? I dont imagine you actually want every last block to be advertised, but more likely what you want is 'all the roots of files in this directory ive pinned', or something to that effect
It's about file deduplication.
Eg. Lets have a large csv dataset with temperatures and it is only growing. With child CID propagation, everyone downloading a newer dataset of these temperatures will be able to download a part from it from Estuary.
Nov 30, 2021 19:36:48 Whyrusleeping @.***>:
@stastnypremysl[https://github.com/stastnypremysl] what is your usecase for this? I dont imagine you actually want every last block to be advertised, but more likely what you want is 'all the roots of files in this directory ive pinned', or something to that effect
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub[https://github.com/application-research/estuary/issues/54#issuecomment-982910235], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABBM5JY6JGDPU7E43S2IKPTUOUKTRANCNFSM5F6CC7HA]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. [data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAD8AAAA/CAYAAABXXxDfAAAAAXNSR0IArs4c6QAAAARzQklUCAgICHwIZIgAAAAmSURBVGiB7cEBDQAAAMKg909tDwcUAAAAAAAAAAAAAAAAAAAAJwY+QwABivJx1AAAAABJRU5ErkJggg==###24x24:true###][Tracking image][https://github.com/notifications/beacon/ABBM5JZ7W6RBA3FLEBIUZXDUOUKTRA5CNFSM5F6CC7HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHKLAKGY.gif]
cc https://github.com/ipfs/go-ipfs/issues/8676 (proposal to have smarter Reprovider.Strategy for UnixFS DAGs)
@lidel @brendalee Where do you think we are on this issue?
Haven't gotten many clients asking for this in the past few months
Currently Estuary is only advertising root CIDs. There are customers (such as the zarr-wg) who have retrieval needs which require them to be able to retrieve the child CID content fast.