Closed gRox167 closed 6 months ago
Hey, thanks for the suggestion!
It looks like most backend functions that einx requires are implemented for dask arrays, although as far as I can tell there is no vmap or other option to vectorize functions over arbitrary axes (which is required by einx.vmap
and einx functions that rely on it such as einx.get_at
). I'll look into adding support for it.
I'm unsure how einx would meaningfully interface with tensor backends that have named axes though (torchdim would be another example) since they follow different philosophies for how axes should be handled. For example:
# With named axes:
x.sum("time") # other axes are implicit
# With einx
einx.sum("b [time] c", x) # other axes are explicit and ordered
I added support for dask.array
in https://github.com/fferflo/einx/commit/5fc0c09f0449df17fa59b8d8e6ae14c6b6e8b51d. Let me know if anything isn't working as expected :)
Thanks for all the work you have done! I will update and have a check!
Required prerequisites
I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
Motivation
Thanks for all the brilliant work of contributors of this repo!
Dask.array
is a distributed version ofnumpy
which could let research easily do parallel computing with cpu.Xarray
is a named array package that is widely used in a lot of scientific area, andxarray
also usedask
as backend to support parallel computing.If
einx
can supportdask.array
orxarray
it would be super convenient to do large-scale (especially those who could not fit into memory) distributed data processing. This could be useful for areas like physics, astronomy, geoscience, microscopy and medical image.Solution
For dask.array, it would be easy. We can just utilize its lazy api, it is pretty much the same with numpy api. However for xarray it is a little bit tricky, as xarray have names binding to each axis. We need to be careful when treating named dimension and the name in Einstein expression.
Alternatives
We can refer to xarray-einstats.