danielpclark / faster_path

Faster Pathname handling for Ruby written in Rust
MIT License
782 stars 33 forks source link

Looking for more preformant `Pathname` object creation #165

Closed danielpclark closed 6 years ago

danielpclark commented 6 years ago

I'm trying to find a way we can instantiate new Pathname instances without the overhead we're experiencing in children_compat and entries_compat. Looking at the C source code for it:

/*
 * Create a Pathname object from the given String (or String-like object).
 * If +path+ contains a NULL character (<tt>\0</tt>), an ArgumentError is raised.
 */
static VALUE
path_initialize(VALUE self, VALUE arg)
{
    VALUE str;
    if (RB_TYPE_P(arg, T_STRING)) {
        str = arg;
    }
    else {
        str = rb_check_funcall(arg, id_to_path, 0, NULL);
        if (str == Qundef)
            str = arg;
        StringValue(str);
    }
    if (memchr(RSTRING_PTR(str), '\0', RSTRING_LEN(str)))
        rb_raise(rb_eArgError, "pathname contains null byte");
    str = rb_obj_dup(str);

    set_strpath(self, str);
    OBJ_INFECT(self, str);
    return self;
}

_This was the method I copied from when implementing Pathname.new_checked in Rust. It's currently not used. It's best designed to receive a raw ruby Value._

I'm wondering if the OBJECT_INFECT macro simply transforms an existing String object into a Pathname (or any) object. This way the same object structure gets all its methods and identity updated without the added memory cost of creating something new. I could be wrong, that's why I want to look into it.

#define OBJ_INFECT(x,s) RB_OBJ_INFECT(x,s)
#define RB_OBJ_INFECT(x,s) ( \
    (RB_OBJ_TAINTABLE(x) && RB_FL_ABLE(s)) ? \
    RB_OBJ_INFECT_RAW(x, s) : (void)0)
#define RB_OBJ_INFECT_RAW(x,s) RB_FL_SET_RAW(x, RB_OBJ_TAINTED_RAW(s))
#define RB_FL_SET_RAW(x,f) (void)(RBASIC(x)->flags |= (f))

The RBasic item is implemented in ruby-sys.


So my thinking is if we already had a Ruby String Value object to work with we could avoid the overhead by implementing object infection. But this may be an entirely mute point since we're working in Rust Strings and have to create a new Ruby Value object in the first place. I think the cost in performance we're experiencing is that in Ruby land the Pathname creation is the only cost they have, where we have to pay for both Pathname and the String it takes as a parameter.

Of course I could be totally off the mark since Ruby has to create the String Value objects for the method entries and children just as we do. For me this is mere speculation and theory. I'd like to look more into it.

glebm commented 6 years ago

Perhaps we could run a profiler on the Rust code to understand why children_compat is so slow. There is https://github.com/AtheMathmo/cpuprofiler that produces pprof-compatible output.

I use pprof a lot at work, it's easy to analyze and explain the output (especially using the flame graph visualization).

danielpclark commented 6 years ago

That's an excellent idea. I'd likely need to share the profile here and ask for your insights on it as I feel you'd be far more familiar with these things.

andrewstucki commented 6 years ago

@danielpclark my understanding of what OBJECT_INFECT is doing is that it's setting a tainted mark on the newly created Pathname object if the previous string is already tainted per Ruby's security model.