Open Erotemic opened 1 year ago
please note that generally the async implementation of ipfsspec should be preferred, and probably the synchronous implementation should be dropped at some point (it's relatively easy to use the async implementation in a synchronous setting, but not the other way around.)
Can you give me an example of how its easy to use async in non-async code? From what I understand you have to create and manage an event loop, which is a lot more code overhead than I want.
Unless I'm missing something and it is a lot easier to use the async variant than I think it is, then I would strongly suggest not dropping the synchronous variant. It's important for immediate feedback when people are working interactively in IPython / Jupyter settings. Also I believe doctests are easier to write with sync code (you don't want to have two lines of setup loop / execute loop boilerplate in a doctest if you just want to access one file).
Ok, maybe the formulation wasn't ideal. To make async code sync, you "just" have to run an event loop and wait for the async code to complete. This has relatively little overhead. On the other hand, if you want to make sync code async, you'll have to create a new thread for every task you start which is much less efficient, especially if you're doing I/O and want to fetch 100s of requests concurrently.
But I agree, there's a bit of boilerplate if you want to make async code look like sync code. Fortunately, fsspec
's AsyncFileSystem
provides a flag asynchronous
which you can use to switch from async to sync API, so in this particular case, it's easy to use. In fact, you'll have the synchronous API by default.
One additional benefit you'll get from the synchronous-from-asynchronous API is, that fsspec
allows to pass in multiple requests in one synchronous call (e.g. cat
may be called with a list
of path
s). The sync-async translation is built such that this single synchronous call will spawn multiple asynchronous tasks which will all request their data in parallel, which removes a lot of round-trip time.
An example would be fsspec
's HTTP implementation, which is implemented as async internally as well:
import fsspec
print(fsspec.open("http://httpbin.org/uuid").open().read())
or, of course, the AsyncIPFSFileSystem
, which you should be able to use in just the same way as the synchronous one.
Ah, I see. Thank you for the clarification. I can just use:
import fsspec
fs_cls = fsspec.get_filesystem_class('ipfs')
fs = fs_cls(asynchronous=False)
results = fs.ls("bafybeief7tmoarwmd26b2petx7crtvdnz6ucccek5wpwxwdvfydanfukna")
print(results)
And that's still using the async implementation. Given that I'm perfectly comfortable with avoiding / dropping the synchronous version.
And the async implementation with asynchronous=True would look like this:
async def test_ipfs_async():
import fsspec
fs_cls = fsspec.get_filesystem_class('ipfs')
fs = fs_cls(asynchronous=True)
session = await fs.set_session() # creates client
result = await fs._ls("bafybeief7tmoarwmd26b2petx7crtvdnz6ucccek5wpwxwdvfydanfukna")
print(result)
await session.close() # explicit destructor
if __name__ == '__main__':
import asyncio
asyncio.run(test_ipfs_async())
Fixes https://github.com/fsspec/ipfsspec/issues/26
Adds some documentation to reference where to learn more about the gateway rest calls, and fixes an issue where ls would fail when a directory contained symlinks or non directory non file data.
I'm not sure what raw or shards are, but ls will list them now. I exepct other parts of the library will have issues with dealing with theses as well.