Open headius opened 1 year ago
Great!
The Java implementation is available at https://repo.maven.apache.org/maven2/org/apache/arrow/ . Does JRuby have a standard way to install Java packages that are available in a Maven repository?
I'm not familiar with the Java implementation API yet but we'll be able to wrap the API step by step.
I think that we should wrap ValueVector
https://arrow.apache.org/docs/java/vector.html as Arrow::Array
as the first step.
Would using FFI work for this? If so it would be a single implementation/binding instead of multiple.
FFI doesn't work...
If we use https://github.com/mvz/gir_ffi instead of https://github.com/ruby-gnome/ruby-gnome/tree/master/gobject-introspection only for non-CRuby implementations, we may be able to use Apache Arrow C++ as the bindings target.
I am working on an example using the Java implementation! I'll have something to show you shortly.
FYI I don't know whether to file this but the documentation on the Java impl is out of date; it shows installing 9.0.0 and then uses classes like RootAllocator that do not appear to exist in that version. The javadocs seem to point at 12.0.0 so I'm trying that.
Oh actually the docs just show arrow-memory-netty but the RootAllocator and other classes are in arrow-memory (which probably gets installed as a dependency?) I just need to know the actual jars to load in JRuby. Getting close.
We should update outdated documents. Could you open a separate issue for it with the outdated document's URL?
BTW, https://arrow.apache.org/cookbook/java/ may help you.
Ok here's the JRuby version of the simple vector example. I'm trying to work out the best way to pull the jars and make it easily runnable for you:
java_import org.apache.arrow.memory.RootAllocator
java_import org.apache.arrow.vector.IntVector
begin
allocator = RootAllocator.new
int_vector = IntVector.new("fixed-size-primitive-layout", allocator)
int_vector.allocate_new(3)
int_vector.set(0,1)
int_vector.set_null(1)
int_vector.set(2,2)
int_vector.set_value_count(3);
puts "Vector created in memory: #{int_vector}"
ensure
int_vector.close rescue nil
allocator.close rescue nil
end
When all necessary dependency jars are loaded (into JRuby via CLASSPATH env or require
each jar), this should work.
Success! Though I think it would be better to set up the proper jar-dependencies logic instead of hand-requiring these jars.
After installing arrow-vector and arrow-memory-netty like this:
$ mvn dependency:get -DgroupId=org.apache.arrow -DartifactId=arrow-vector -Dversion=12.0.0
...
$ mvn dependency:get -DgroupId=org.apache.arrow -DartifactId=arrow-memory-netty -Dversion=12.0.0
...
I was able to run the following script (the slf4j errors are likely because I just don't have the right jars for it loaded):
require '~/.m2/repository/org/apache/arrow/arrow-vector/12.0.0/arrow-vector-12.0.0.jar'
require '~/.m2/repository/org/apache/arrow/arrow-memory-core/12.0.0/arrow-memory-core-12.0.0.jar'
require '~/.m2/repository/org/apache/arrow/arrow-memory-netty/12.0.0/arrow-memory-netty-12.0.0.jar'
require '~/.m2/repository/org/apache/arrow/arrow-format/12.0.0/arrow-format-12.0.0.jar'
require '~/.m2/repository/io/netty/netty-buffer/4.1.90.Final/netty-buffer-4.1.90.Final.jar'
require '~/.m2/repository/io/netty/netty-common/4.1.90.Final/netty-common-4.1.90.Final.jar'
require '~/.m2/repository/com/google/flatbuffers/flatbuffers-java/1.12.0/flatbuffers-java-1.12.0.jar'
require '~/.m2/repository/org/slf4j/slf4j-api/1.7.36/slf4j-api-1.7.36.jar'
java_import org.apache.arrow.memory.RootAllocator
java_import org.apache.arrow.vector.IntVector
begin
allocator = RootAllocator.new
int_vector = IntVector.new("fixed-size-primitive-layout", allocator)
int_vector.allocate_new(3)
int_vector.set(0,1)
int_vector.set_null(1)
int_vector.set(2,2)
int_vector.set_value_count(3);
puts "Vector created in memory: #{int_vector}"
ensure
int_vector.close rescue nil
allocator.close rescue nil
end
Running it with the requisite --add-opens
flag that the arrow Java bindings need:
$ jruby -J--add-opens -Jjava.base/java.nio=ALL-UNNAMED arrow-vector.rb
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Vector created in memory: [1, null, 2]
So that's a basic start!
A few things to improve before moving forward:
require_jar
to pull in arrow-vector and arrow-memory-netty with all dependencies.The add-opens is probably a requirement because the netty memory implementation needs access to the internals of
Java's NIO ByteBuffer class. There may be alternative ways that don't need to open up ByteBuffer, or we can make this
work in JRuby by adding the --add-opens
flag to .jruby.java_opts
which gets loaded automatically:
[] jruby-arrow-vector $ cat .jruby.java_opts
--add-opens java.base/java.nio=ALL-UNNAMED
[] jruby-arrow-vector $ jruby arrow-vector.rb
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Vector created in memory: [1, null, 2]
Something using JRuby's FFI or Project Panama's native memory access might make sense.
I'm glad this was reasonably easy to get working. How do you want to proceed?
Edit: fixed typos and removed slf4j jars that weren't helping the warning.
The only thing that could be updated in the documentation is the version number; 12.0.0 is latest but this page shows how to install 9.0.0:
https://arrow.apache.org/docs/java/install.html
The rest of my issues were just because I was trying not to use jar-dependencies and manually requiring all the jars.
OK. I've opened a new issue for the documentation: #35602
I confirmed that the command lines and script your provided work on my environment too!
Could you open a pull request that includes the followings?
Then I'll push some commits to integrate the current Ruby implementation and CI configurations to the pull request.
We can work on "Something using JRuby's FFI or Project Panama's native memory access might make sense." to avoid --add-opens
after we merge the first pull request.
@eregon If you're interesting in Red Arrow for TruffleRuby, please open a new issue for it. I have an idea for it. The current gobject-introspection gem generate bindings at run-time. I think that we can improve the gem to generates Ruby scripts that use Fiddle to use functions defined in C. It will work with TruffleRuby.
A Fiddle/FFI version would also work for JRuby, but the shortest path is probably to simply use or wrap the Java API. I will continue along that path for now.
I think that JRuby should use the Java API for easy to install.
If JRuby uses a Fiddle based approach, JRuby needs to install the C++ and C libraries (*.so
/*.dylib
/*.dll
) instead of *.jar
.
Describe the enhancement requested
JRuby users would benefit from support in red-arrow but currently the only Ruby bindings available use a native exception. Since JRuby does not support native extensions, we would want to add support by leveraging the Java bindings for Arrow.
This could be done in pure Ruby using JRuby's Java integration layer, and as needed for performance we could move some of that code into Java later.
I would be willing to help with this but I am unfamiliar with Arrow and the Ruby API that wraps it. JRuby's Java integration is very easy to use, however, and mimicking the C extension using JRuby + Ruby + Java integration should go pretty quickly.
Component(s)
Ruby