Recursive `bzz.list()` - Githubissues

AuHau commented 4 years ago

What do you think about adding an option for bzz.list() that would recursively fetch all entries for a given manifest? Resolving all the common_prefixes returned by Swarm?

PaulLeCam commented 4 years ago

I'd rather avoid adding extra logic to bzz.list() so we can keep it simple and matching Swarm's response payload, but maybe that could be an extra package similar to the timeline one with higher-level logic and more custom data structures?

I have implemented something like this already for a simple app, but I think this logic should be improved to support a maximum recursion level, maybe using Observables to track progress rather than a Promise, possibly lazy-loading, etc. One possibility as well rather than using a nested data structures could be to use a flat one with the paths as keys, really depends on the use cases I guess.

AuHau commented 4 years ago

Well, I don't really agree with having it as a separate package. I mean it is an option, but I think this approach only contributes to the "package hell" problem of the JS ecosystem. This is a very tightly coupled feature that I would have actually expected to be supported by Swarm directly, but it is not so I see improvement space here that this library can take advantage of. I admit it might involve some more logic, but it will be very "underhood" and to the user, it should only be visible as simple recursive option.

I am not sure how demanding the list() call actually is, but I think Observables may be bit overkill. I assume that most of the people would buffer up the results first anyway before proceeding to do something with it, so I think Promise will be sufficient enough for it. Or do you think that we could hit some "reasonable" limit (eq. reasonable number of list calls) with relative common/small manifests?

Agree with the maximum recursion level limit, would have some reasonable default and allow user to override that.

Also, I like flattening the result data-structure, that was actually my original idea.

PaulLeCam commented 4 years ago

I think we're mixing a bunch of different things here so I'd like to keep these concerns separated.

First, I disagree with the "package hell" problem, at least if you mean it as "many packages doing small things". I think it's very easy to bundle things together, while unbundling is much harder, this is why in Erebos there is a separation between the utilities (encoding, signing...), API-level interactions (BZZ, PSS) and higher-level concerns (such as Timeline). If Erebos was provided as a single package, it would be way heavier for everyone, which in my opinion is a problem especially if you consider it included in Swarm apps that users would eventually pay to load. Even if it's very small amounts it's still waste of time/bandwidth/money.

I also disagree with the idea that because you're expecting it should be supported by Swarm but is not, it should be supported by Erebos. If you expect it to be supported by Swarm, you should open a SWIP or discuss it in another way with the Swarm team because Erebos is not meant to "replace" behavior that is not provided by Swarm. The @erebos/api-* packages are meant to provide a relatively low-level client for Swarm, possibly with some higher-level abstractions to cover common use cases, but that are basically there to reduce boilerplate for apps. They are not meant to abstract complex Swarm interactions.

I am not sure how demanding the list() call actually is, but I think Observables may be bit overkill. I assume that most of the people would buffer up the results first anyway before proceeding to do something with it, so I think Promise will be sufficient enough for it. Or do you think that we could hit some "reasonable" limit (eq. reasonable number of list calls) with relative common/small manifests?

Well that's exactly the kind of questions that make things way more complex than for example adding a recursive?: boolean flag to the bzz.list() options in my opinion, and why I don't think this is suitable for a low-level client API. I'm more eager to provide building blocks so that developers can implement the behavior they need for their app, rather than a go-to solution that would hide the underlying complexity and potentially be bad for apps (for example making a recursive call on hash with lots of subfolders).

To me a more interesting opportunity is to discuss it more with the rest of the community, first asking the Swarm team why Swarm doesn't provide this functionality by itself, then getting other potential users of such API to give feedback about their use cases, and creating a spec for it as we did for the Timeline API.

Sorry for the long reply but to summarize: I think it's useful but it shouldn't be provided as a naive low-level API, and there are likely different concerns in terms of Swarm behavior, performance and UX that need to be taken into account.

AuHau commented 4 years ago

Thanks for such a long explanation! Really appreciate it!

First, I disagree with the "package hell" problem, at least if you mean it as "many packages doing small things". I think it's very easy to bundle things together, while unbundling is much harder, this is why in Erebos there is a separation between the utilities (encoding, signing...), API-level interactions (BZZ, PSS) and higher-level concerns (such as Timeline). If Erebos was provided as a single package, it would be way heavier for everyone, which in my opinion is a problem especially if you consider it included in Swarm apps that users would eventually pay to load. Even if it's very small amounts it's still waste of time/bandwidth/money.

Well, good point about the payment part. And agree with the bundling/unbundling, yet I don't agree with the level of the modularization. But it is alright, it is somehow a philosophical question that everybody has to answer.

I also disagree with the idea that because you're expecting it should be supported by Swarm but is not, it should be supported by Erebos. If you expect it to be supported by Swarm, you should open a SWIP or discuss it in another way with the Swarm team because Erebos is not meant to "replace" behavior that is not provided by Swarm. The @erebos/api-* packages are meant to provide a relatively low-level client for Swarm, possibly with some higher-level abstractions to cover common use cases, but that are basically there to reduce boilerplate for apps. They are not meant to abstract complex Swarm interactions.

Good point as well. Make sense to implement this in Swarm (if possible). I guess I am closer to this library then to Swarm itself, so that is why I proposed it here instead there. 😅

Well I will open an Issue in Swarm and see what they will have to say to it :-)

PaulLeCam commented 4 years ago

Well, good point about the payment part. And agree with the bundling/unbundling, yet I don't agree with the level of the modularization. But it is alright, it is somehow a philosophical question that everybody has to answer.

I agree it's tricky to get it right, the approach I have for Erebos is to split between individual utilities (I don't want a single "utils" package that ends-up being a dependency for everything, and hard to make breaking changes to), individual APIs (BZZ, PSS) possibly per platform (again with the idea to avoid having extra dependencies such as node-specific ones in a browser build), and "high-level" packages such as SwarmClient bundling the BZZ and PSS APIs and Timeline that adds functionality on top of the BZZ API. If you have suggestions about how this can be improved please don't hesitate to share! At the end of the day it's about tradeoffs between flexibility and ease of use and I've opted for more flexibility, but if there are simple ways to improve the ease of use I'm all up for it.

Good point as well. Make sense to implement this in Swarm (if possible). I guess I am closer to this library then to Swarm itself, so that is why I proposed it here instead there. 😅

I totally get it, no problem! I think it's good for the Swarm project itself to get this kind of feedback to decide what are the boundaries between what should be handled by Swarm and what is out of scope, and I hope the SWIP process can help standardize this kind of decision, and possibly client specs to implement this behavior, so that it can be generic to the Swarm ecosystem rather than specific to Erebos.

MainframeOS / erebos

Recursive `bzz.list()` #123