filecoin-project / lassie

A minimal universal retrieval client library for IPFS and Filecoin
Other
111 stars 17 forks source link

Try using lassie to list nodes. #483

Closed Jennyism closed 3 months ago

Jennyism commented 3 months ago

I tried to retrieve it using this command, but I got garbled code lassie fetch -o - bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4/ls

Fetching bafybeic56z3yccnla3cutmvqsn5zy3g24muupcsjtoyp3pu5pm5amurjx4/ls.:�eroots��*X%p ]�w� ��I���{�l��)G�I�����{:R)�gversion_p ]�w� ��I���{�l��)G�I�����{:R)�5$p X�U���,;�}���V���S���Ѧ��birb.mp4���

The output I expect is something like "lotus client ls". What can I do to realize my idea?

rvagg commented 3 months ago

You're going to need to couple it with a CAR utility since lassie only speaks CAR. https://github.com/ipld/go-car/ if you're using Go or want it to work on the CLI. https://github.com/ipld/js-car for JavaScript or https://github.com/storacha-network/ipfs-car might work better. Or, even use https://github.com/ipfs/helia to piece together what you need.

Lassie will give you the raw IPLD blocks in a CAR file, you need to manipulate or assemble them back into files. There's some info about this on the Lassie README.

If you're looking to build an ls-like utility then you probably want to fetch fetch with a dag-scope of entity and you'll get the blocks that define a directory structure, but again you'll have to deal with decoding that yourself. This isn't functionality that's built into Lassie, it's designed for finding and downloading your raw data.

Jennyism commented 3 months ago

Thanks for the reply, I've tried using go-car and I get the parsed result. However, due to the list command design, I cannot get all the dataCids in this file. On the other hand, I can only get the content at depth 1, what if I want to get the content at a deeper depth? I'm guessing that the existing query expression fulfills my needs, but I haven't found one.

When I search for an advertised CAR, I can theoretically look up the list of entities through the ads, but Lassie doesn't provide the ability to browse entities. What's more, when I'm retrieving a car that hasn't been advertised, I have to find a way to get a list of entities.

My goal is to be able to easily browse the entities contained in the car and download what I need, rather than downloading the full CAR file and then extracting it from the CAR.

I can develop a tool based on an existing open source project to implement my idea, but I would prefer a universal approach.

(With the help of the translator, please forgive me if there are any errors in the expression)

rvagg commented 3 months ago

https://specs.ipfs.tech/http-gateways/trustless-gateway/ this might be useful reading for you, lassie works with this method of retrieving blocks and assembling a CAR for you (even when it uses bitswap it works according to the specification here).

You can use --dag-scope and --entity-bytes and put a path after your CID to path down into a UnixFS DAG just to fetch the blocks that form that sub-path. You can even give it all in one go, as if it's a URL of the form specified by the Trustless Gateway spec: lassie fetch /ipfs/bafy.../path/to/thing?dag-scope=entity.

Lassie doesn't actually fetch the CAR, (although when talking in HTTP to the storage provider it will talk in CAR format, but that's an implementation detail), it fetches blocks and assembles the CAR for you. The trick is to present the root + path + "dag scope" to get what you want and only what you want in the resulting CAR.

If you have a CID, but don't know what it points to then you have a bit of a challenge. You could dag-scope=block to fetch just that block and have a look inside it to figure out what to do next. It might tell you it's a large file, in which case you can decide whether to go back and fetch the whole thing. Or it might tell you it's a directory structure, in which case you could re-fetch it with dag-scope=entity so you can see the directory and decide what to do next - maybe path down into a subdirectory and do the same thing.

Ideally you'd know what you're fetching so you don't have to do all this exploration. But if exploration is what you're building, then you're going to have to do this incrementally. And unfortunately you're going to have to get a bit dirty with UnixFS and understanding how it's used to represent files and directories and how to decode them to make your decisions. If you just have a big CAR that represents one big file, or represents a full directory structure, then that's fairly easy, just use go-car as is and extract it.

If you're comfortable with Go, then I'd recommend having a look in go-car at the way it handles UnixFS. The car extract command is probably a good place to start.