Open j6k4m8 opened 2 years ago
Hi Jordan :).
Supporting BossDB or cloudvolume should indeed be relatively straight forward and would be a great addition here.
I am using open_file from elf
(another of my libraries that wraps most of the "custom" algorithms that are used here) internally to open n5
, hdf5
and zarr
files (also implements read-only support for some other file formats).
So the clean way would be to extend open_file
s.t. it can return a wrapper around the cloud-stores that enables read and (if necessary) write access. The extensions for open_file
are implemented here. Note that open_file
currently just relies on the file extension, see here. But it would be totally ok to add some checks beforehand that check if the input is a url (or whatever address you would pass for the cloud-store) and then return the appropriate wrapper if it is.
Hi hi :)
Super super awesome! In that case I'll start retrofitting open_file
— do you prefer I open a draft PR into elf
so you can keep an eye on progress and make sure I'm not going totally off the deep end? Happy to close this issue in the meantime, or leave it open in pursuit of eventually getting cloud datasets running through these workflows!
do you prefer I open a draft PR into
elf
so you can keep an eye on progress and make sure I'm not going totally off the deep end?
Sure, feel free to open a draft PR and ping me in there for feedback.
Happy to close this issue in the meantime, or leave it open in pursuit of eventually getting cloud datasets running through these workflows!
Yeah, let's keep this issue open and discuss integration within cluster_tools once we can open the cloud stores in elf. I'm sure a couple more things will come up here.
Starting here! https://github.com/constantinpape/elf/pull/41
This looks like a super powerful tool, looking forward to using it! I'd love to implement an API abstraction for cloud datastores like BossDB or CloudVolume so that one could, in theory, generate peta-scale segmentation without having to download the data and reformat into n5/hdf.
These datastores tend to have client-side libraries that support numpy-like indexing: e.g:
My understanding is that this should be a simple drop-in replacement for the
ws_path
andws_key
if we had a class that looked something like this:(I expect I've forgotten a few key APIs / organization, but the gist is this)
Is this something that you imagine is feasible? Desirable? My hypothesis is that this would be pretty straightforward and open up a ton of cloud-scale capability, but I may be misunderstanding. Maybe there's a better place to plug in here than "pretending" to be an n5 file?