elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

Remote series and dataframes and distributed GC #932

Closed josevalim closed 4 months ago

josevalim commented 4 months ago

Automatically transfer data between nodes for remote series and dataframes and perform distributed garbage collection.

The functions in Explorer.DataFrame and Explorer.Series will automatically move operations on remote dataframes to the nodes they belong to. This module provides additional conveniences for manual placement.

Implementation details

There is a new module called Explorer.Remote. In order to understand what it does, we need to understand the challenges in working with remote series and dataframes.

Series and dataframes are actually NIF resources: they are pointers to blobs of memory operated by low-level libraries. Those are represented in Erlang/Elixir as references (the same as the one returned by make_ref/0). Once the reference is garbage collected (based on refcounting), those NIF resources are garbage collected and the memory is reclaimed.

When using Distributed Erlang, you may write this code:

remote_series = :erpc.call(node, Explorer.Series, :from_list, [[1, 2, 3]])

However, the code above will not work, because the series will be allocated in the remote node and the remote node won't hold a reference to said series! This means the series is garbage collected and if we attempt to read it later on, from the caller node, it will no longer exist. Therefore, we must explicitly place these resources in remote nodes by spawning processes to hold these refernces. That's what the place/2 function in this module does.

We also need to guarantee these resources are not kept forever by these remote nodes, so place/2 creates a local NIF resource that notifies the remote resources they have been GCed, effectively implementing a remote garbage collector.

TODO

jonatanklosko commented 4 months ago

Amazing!! 🔥🐈‍⬛