erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.38k stars 2.95k forks source link

Introduce enif_alloc_list / enif_make_alloc_list / enif_release_alloc_list #6293

Open josevalim opened 2 years ago

josevalim commented 2 years ago

When working with NIFs and you have to allocate large lists, we typically have two options:

  1. Use enif_make_list_from_array but that requires allocating the data into an array, which will then be discarded
  2. Use enif_make_list_cell but that requires traversing from the back and allocating the cells one at a time

For large lists (32 elements over), 1 ends up being faster because the whole list is allocated at once but it uses more memory. 2 uses less memory but ends up slower (at least on Apple M1).

It would be nice if we had a similar API to binaries, where the list is preallocated once and then we can set the list elements one by one:

ErlNifBinary mutable_list;
enif_alloc_list(env, 10, enif_make_list(env, 0), mutable_list);

The above will allocate a list of 10 elements, where each element points to an empty list (the third argument). Now, we can update the given list element with:

enif_set_list(env, 0, enif_make_int(env, 1), mutable_list);
enif_set_list(env, 1, enif_make_atom(env, "awesome"), mutable_list);

Eventually, enif_make_alloc_list or enif_release_alloc_list needs to be invoked, similar to the binary API. The hope is that this API will put the performance closer to enif_make_list_from_array (the cost is one extra pass setting the CARs) while avoiding allocating an intermediate array.

I am not sure if this proposal makes sense, therefore feedback is welcome. :)

Is your feature request related to a problem? Please describe.

This came up here: https://github.com/rusterlium/rustler/pull/486

sverker commented 2 years ago

I see two problems with the API:

  1. It's not functional. It's possible to create cyclic terms (by mistake).
  2. It kind of forces the implementation to allocate the list as one contiguous heap block.

Nr 1 is more about API consistency than a practical problem I think.

What do you think about a callback driven API, something like this:

{
    int counter = 1;
    return enif_generate_list(env, 10, callback, &counter);
}

ERL_NIF_TERM callback(ErlNifEnv* env, void* arg)
{
    int* counter_ptr = (int*)arg;
    return enif_make_int(env, (*counter_ptr)++);
}

This would make a lists:seq(1,10). It's a compact API, just one function enif_generate_list. It could also be expanded to let the callback function decide when to terminate the list. The drawback is the list terms must be generated sequentially from first to last. No random access like your enif_set_list.

josevalim commented 2 years ago

API wise that looks great to me, although I am unsure if this is something we could leverage from Rust. @hansihe, do you have thoughts here?

The drawback is the list terms must be generated sequentially from first to last.

If that ever becomes a need, we could also have enif_generate_reverse_list or something.

hansihe commented 2 years ago

I think an API like that would work very well with Rust iterators.

Allowing the callback to decide when the list terminates is a good idea as well, as it would enable us to build lists from iterators of unknown length. I don't know the internals on the OTP side, but if it could be beneficial, allowing the caller to specify an optional list size hint would also be an option.

sverker commented 2 years ago

Proposal:

ERL_NIF_TERM enif_generate_list(ErlNifEnv*,
                                ErlNifGenerateListFn* callback,
                                size_t length_hint);

typedef ERL_NIF_TERM ErlNifGenerateListFn(ErlNifEnv*,
                                          ERL_NIF_TERM* end_tail,
                                          void* arg);

To terminate the list, the callback would do *end_tail = enif_make_list(env,0); or set something else to append on another list or even make an improper list.

It should probably be named something else as all other term creating functions are named enif_make_*. The other list creating functions are called enif_make_list*, so maybe

enif_make_list_generate
enif_make_list_while
sverker commented 1 year ago

A revised API with simple implementation proposal. In short, in each callback

ERL_NIF_TERM enif_make_list_while(ErlNifEnv env, ErlNifMakeListWhileFn callback, void arg, size_t length_hint) { Eterm result; Eterm prev_tailp = &result; Eterm head, tail; do { head = tail = THE_NON_VALUE; callback(env, &head, &tail, arg);

    if (head != THE_NON_VALUE) {
        Eterm* hp = alloc_heap(env, 2);
        *prev_tailp = make_list(hp);
        CAR(hp) = head;
        prev_tailp = &CDR(hp);
    }
} while (tail == THE_NON_VALUE);

*prev_tailp = tail;
return result;

}


Features
* Create list from head to tail without prior knowledge of list length
* Create improper lists
* Append on existing list
* Create empty lists
* Option to end the list either at creation of last cell or at next call after last cell, depending on what fit users iteration best.
* Bonus "feature": Make any term you want (by writing only to `*tail` at first call).

I thought about adding some way to abort and signal error back to caller, but the user can do that themself with `arg`.
josevalim commented 1 year ago

LGTM and i think it would also work with Rust iterators, right @hansihe?

hansihe commented 1 year ago

Yep, this looks like a really nice API that would fit in with Rust iterators very well.