[Feature]: improve the "split" and "sequential" features

mindplay-dk commented 9 months ago

Feature Request

I had a difficult time understanding the "split" and "sequential" features, even after reading the docs - I went on the Discord and got more of an explanation from this video, and I understand now why this feature was confusing, but I'd like to make some suggestions.

First off, when you enable "split", the "max" input appears with a default value of 10 - this was my first source of confusion.

It turns out, this value is a hard limit - any input over 10 elements is simply discarded.

I understand this feature is there to "protect" users against accidentally blowing all their credits on massively parallelized workloads. 😅

But the need to actually truncate your input for real workloads (anything other than early testing) is probably extremely rare - so I don't think this setting makes sense as a default.

But I'm not really convinced this feature makes any sense at all. If your actual intention was to truncate your input for any reason, using the "slice" node would seem like a more natural, explicit, and self-explanatory choice here.

What might be useful, is to parallelize a workload, to save time when running larger workloads - for for this case, I would expect a feature that works more like a promise pool, limiting the number of parallel tasks without truncating the input.

As for the "sequential" option, understanding how "split" and "max" works, things got even more confusing: sequential appear to mean it's not splitting - even though the "split" option needs to be enabled for "sequential" to even show up. In "sequential" mode, it's not parallelizing the workload, it's literally just truncating the input.

What I would suggest:

Drop the terms "split", "max" and "sequential", as none of these are easy to understand.
The first option could simply be named "repeat" - this would enable the node to process multiple inputs.
When "Repeat" is enabled, additional options could show up:
a "Concurrency" setting would specify the max number of tasks to run in parallel, using some sort of pool or queue - to avoid budget explosions, I would suggest a default of 1.
With concurrency > 1, I would suggest values should arrive at the output in the same order they were added. (at least by default - you might consider adding an option in the future to let values arrive at the output in undefined order, as quickly as possible.)

If you think the ability to truncate the input is very important, I would suggest an option labeled "Truncate", which would display a numeric input labeled "Limit", which might default to 10 as before.

For backwards compatibility, when loading an existing diagram:

If "split" and "max" are set, enable "Repeat" with "concurrency" set to the old "max" value, and enable "Truncate" with "Limit" also set to the old "max" value.
If "sequential" is set, disable "Repeat", and enable "Truncate" with the previous "max" value.

This should give you the same behavior as before, keeping things functionally backwards-compatible, while improving the functionality and semantics.

Code of Conduct

[X] I agree to follow this project's Code of Conduct

gogwilt commented 8 months ago

I like these suggestions! "Split" is definitely one of the more confusing but powerful concepts in Rivet, and updating the language could help a ton.

I wonder if we should go more all-in, and actually just create a "Repeat" or "Map" node. You could imagine a Map Node taking in an array (or multiple arrays to zip together), and specifying a subgraph to run on each element of that array.

@codemile had mentioned another node-based programming environment that used a similar paradigm for loops. I'd be interested in his thoughts here!

mindplay-dk commented 8 months ago

I wonder if we should go more all-in, and actually just create a "Repeat" or "Map" node. You could imagine a Map Node taking in an array (or multiple arrays to zip together), and specifying a subgraph to run on each element of that array.

Isn't that exactly what the subgraph node does in split mode?

gogwilt commented 8 months ago

Isn't that exactly what the subgraph node does in split mode?

@mindplay-dk - yes, that's right. I'm imagining usability being the main benefit of a special Map Node (vs a subgraph in split mode). With a Map Node, it might be easier for users to remember how to iterate over arrays. We could also add validation to the graphs, to make sure they have the right inputs and outputs. There might also be UX improvements to how we display the node itself.

mindplay-dk commented 8 months ago

Hmm, a Map node sounds like a pretty substantial breaking change? I sometimes have many consecutive nodes in "split" mode, so these would need a lot of extra nodes. What would happen when you load a diagram from a previous version? And keep in mind that both subgraph nodes as well as all other node types can be split.

I'm not convinced that adding extra nodes would make the diagrams any easier to read.

What I might suggest instead is use clearer indications in the UI when nodes are split. (The little icon is easy to overlook. Color-coding split nodes, or their connecting wires, might be more helpful.)

Combined with, yes, as you said, validation of input/output types - and to start, displaying the correct types.

(These things should probably be proposed and discussed in separate issues though.)

codemile commented 8 months ago

@gogwilt on the subject of loops, I was referring to SideFX's Houdini where they have a foreach node. It's been a long time since I used it, but they had a feature where you could use a subgraph to perform the work. In the subgraph, you would have input nodes and output nodes. Each running of the subgraph is a single iteration of the loop. You would have an input that was the accumulator and an output node as well.

You would drop a foreach node in the parent graph and then select the subgraph that would be run. The output on that node would be the accumulated result.

Now in SideFX's Houdini, you would connect a collection as the input. Unlike Rivet where we currently define circular connections and our own iteration condition. I think that's because Rivet can do things like reductions to find the best answer, and in Houdini you tend to process collections (i.e. vertexes).

remixer-dec commented 2 months ago

When I tried using split on Chat node, no matter what Max parameter is used, it runs the node in an infinite loop, not sure if it is a bug or it is by design.

abrenneke commented 2 months ago

@remixer-dec that probably has something to do with your node graph, because I split chat nodes all the time successfully

Ironclad / rivet

[Feature]: improve the "split" and "sequential" features #301

Feature Request

Code of Conduct