Open asfimport opened 3 years ago
Joris Van den Bossche / @jorisvandenbossche: @lidavidm From the description above, it's not fully clear to me if you are talking about the (standalone) Tensor message type of the IPC protocol, or about storing a tensor as a value in a RecordBatch field.
Your description seems to talk about the first, but the mailing list thread talks about the second I think. There are also some open issues about defining a standard ExtensionType for storing arrays in RecordBatch fields (ARROW-1614, ARROW-8714)
David Li / @lidavidm: Hey @jorisvandenbossche this is about the standalone Tensor type - I'd like both eventually, but having the Tensor type itself implemented is a prerequisite to that, at least for our use cases (Python <-> Java). Thanks for the pointers!
Micah Kornfield / @emkornfield: @lidavidm i took a very cursory look at the code and it seems straight-forward. But one question, I had is if there is an existing OSS tensor model that makes sense for us to re-use or is the Arrow off-heap/object model enough of a snowflake to make that impractical?
David Li / @lidavidm: Thanks [~emkornfield@gmail.com]. I'm not aware of an existing model. Honestly, my intent here is not really to provide an API to manipulate them in Java, but to just make it possible to round-trip them and convert to/from other APIs, hence why the methods on this Tensor are pretty sparse.
A brief search turns up these:
Vectorz - double[] based - https://github.com/mikera/vectorz/blob/develop/src/main/java/mikera/arrayz/impl/BaseNDArray.java
Maybe we should consider if our Tensor can be easily (zero-copy) wrapped by djl.ai's since they seem to have a similar structure, though it seems they also have their own memory management model.
Micah Kornfield / @emkornfield: Thanks for investigating, I'm not an expert in this space, but I can try to take a look at the PR if no one else has provided feedback.
We'd like to be able to round-trip NumPy ndarrays through Java, and create tensors in Java that can be eventually mapped to ndarrays in Python. Having even a basic Tensor implementation, with extension types, as a contrib module would help greatly.
Some prior discussions
Reporter: David Li / @lidavidm
PRs and other links:
Note: This issue was originally created as ARROW-10101. Please see the migration documentation for further details.