d-unsed / ruru

Native Ruby extensions written in Rust
MIT License
832 stars 40 forks source link

Attaching Rust state/data to Ruby instances #43

Closed Zapotek closed 7 years ago

Zapotek commented 7 years ago

Is there a clean way to attach Rust state/data to a Ruby class? Something like instance variables which are only visible to the Rust env?

The reason I'm asking is because I'd like to expose a Rust implementation of a Ruby class, but use Rust's more robust data structures and memory management for the internal heavy lifting.

d-unsed commented 7 years ago

Hi @Zapotek

I'm currently working on this feature! Ruby supports attaching data structures to Ruby objects. There're some problems with memory management which I am trying to solve. Will update you when it's finished.

Zapotek commented 7 years ago

There's one more requirement to this that I forgot to mention. The Rust data of the Ruby exposed object should be visible to other Rust objects.

For example:

# Implemented in Rust, API exposed to Ruby.
class Signature

    # Internal only. Implemented in Rust (as a HashSet for example),
    # invisible to Ruby.
    attr_reader :tokens

    # Implemented in Rust, exposed to Ruby.
    def difference_ratio( other )
        compute_difference( tokens, other.tokens )
    end

end

sig1 = Signature.new
sig2 = Signature.new

# This needs to be able to happen, but requires that internal Rust data can
# be accessed by other Rust instances.
sig1.difference_ratio( sig2 )

Do you think that's possible?

d-unsed commented 7 years ago

Implemented in 0.9.0.

Please take a look at Class::wrap_data(), Object::get_data() and wrappable_struct!

Zapotek commented 7 years ago

Thanks a lot.

I get storing and retrieving data for the same instance, but do you have an example as to how to do sig1.difference_ratio( sig2 )? i.e. how to get wrapped data for another instance.

d-unsed commented 7 years ago

@Zapotek it should look like this

methods!(
    RubySignature,
    itself,

    // ...

    // If `VerifiedObject` is not implemented for current class, use it as `AnyObject`
    fn difference_ratio(other_signature: AnyObject) -> NilClass {
        let other_signature = other_signature.unwrap();

        // struct of current object
        let tokens = itself.get_data(&*SIGNATURE_WRAPPER); 

        // struct of other object
        let other_tokens = other_signature.get_data(&*SIGNATURE_WRAPPER);

        // ...
    }
);
Zapotek commented 7 years ago

One more question, what are the memory characteristics of this? Do the wrapped data get removed when the Ruby instance is GC'ed?

d-unsed commented 7 years ago

You are right! Each wrapper contains a pointer (dfree) to a function which is responsible for deallocating the structures. This function is automatically called during the sweep phase when the corresponding Ruby object is GC'ed.

Zapotek commented 7 years ago

Excellent thank you, this is a great project, I hope to contribute once I get a better grasp of Rust.

Zapotek commented 7 years ago

Hello again,

Is there a way to free the wrapped data explicitly? Sometimes I'd rather not wait for a GC run.

PS. I tried the Box::from_raw approach but after a while I got a segfault, probably due to a double free once the GC ran.

d-unsed commented 7 years ago

Hi! Can you please provide some more details for the case when it may be needed to free the data while its object is still alive?

Zapotek commented 7 years ago

In my case, I'm performing a large series of RAM heavy operations and I'd like to avoid huge spikes in consumption. What would normally take a couple of GB can be easily achieved with only 70MB by freeing wrapped data after each op.

I actually worked around this by letting the data of the wrapped object go out of scope in Rust, but in some cases a large amount of memory was still retained by the Ruby interpreter, if I could outright remove wrapped data then it would help me better understand the situation. I don't think I'm leaking stuff on the Rust side, so the issue must be on the Ruby side.

d-unsed commented 7 years ago

As an option, I could implement using custom free functions for wrapped data, but I would like to get some more context first. Maybe it's possible to solve the problem with existing solution.

If we have custom free function (which will be no-op in this case), the object will have keep some state to manage the access to its wrapped data (prevent from double free, prevent from use after free etc). So the code becomes less safe.

Thus I would like to get some more context on how the object decides when it's the right time to drop the underlying data.

Btw is the project open sourced?

If you use Gitter, feel free to ping me there (d-unseductable) privately or via the ruru room.

Zapotek commented 7 years ago

The code isn't public but I'll try to setup a reproducible case for you.

Zapotek commented 7 years ago

This should help clear things up:

use std::iter;
use ruru::{Class, Object, RString, NilClass, AnyObject, Boolean};

pub struct DataStruct {
    pub data: Option<String>
}
impl DataStruct {
    fn new( data: Option<String> ) -> Self {
        DataStruct {
            data: data
        }
    }

    fn clear( &mut self ) {
        self.data = None;
    }
}

wrappable_struct!( DataStruct, DataStructWrapper, DATA_STRUCT_WRAPPER );

class!( DataExt );
unsafe_methods!(
    DataExt,
    itself,

    fn new() -> AnyObject {
        let data = DataStruct::new(
            Some( iter::repeat( "a" ).take(4000000).collect() )
        );

         Class::from_existing( "DataExt" ).
            wrap_data( data, &*DATA_STRUCT_WRAPPER )
    }

    fn free() -> NilClass {
        itself.get_data( &*DATA_STRUCT_WRAPPER ).clear();
        NilClass::new()
    }

    fn force_free() -> NilClass {
        unsafe {
            Box::from_raw( itself.get_data( &*DATA_STRUCT_WRAPPER ) )
        };
        NilClass::new()
    }
);

pub fn initialize() {

    Class::new( "DataExt", Some(&Class::from_existing("Data")) ).define( |itself| {
        itself.def_self( "new", new );
        itself.def( "free", free );
        itself.def( "force_free", force_free );
    });

}
def get_mem
    _, size = `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)
    size
end

m = get_mem
puts "On load:             #{m}"

1000.times do |i|
    d = DataExt.new

    # Memory will not budge even though the data will go out of scope
    # in Rust; it'll grow to about 4GB.
    d.free

    # Memory will be freed after use, will stay at around 70MB.
    # d.force_free
end

c = get_mem
puts "After object allocs: #{c} (+#{ c - m })"

puts 'Done'

# If we used DataExt#force_free we'll get a double-free segfault
# on exit, sleep so that we can see the print-out without having
# to scroll up.
sleep
Zapotek commented 7 years ago

Did you manage to reproduce the issue?

d-unsed commented 7 years ago

With d.free I get ~70 MB, the same as with d.force_free.

On load:             8536
After object allocs: 72112 (+63576)
Done

Which OS do you use?

Can you please also try self.data.take(); instead of self.data = None;?

Zapotek commented 7 years ago

I'm on Kubuntu 16.04: ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]

I don't understand what's going on but I can't reproduce it anymore, it doesn't even rise to 70MB.

I did manage to reproduce it on a more complex class, which was what prompted me to ask about explicit free-ing, but the problem went away there as well when I renamed the free method I had defined via unsafe_methods! to something_free.

Did I overwrite something internal that resulted in a leak?