badtuple / pipelang

An embedded Pipe and Filter language
11 stars 0 forks source link

Any future? #2

Open dumblob opened 3 years ago

dumblob commented 3 years ago

Do you have any plans to extend, develop and maintain this language further? If so, what are the plans (I have some crazy ideas...)?

badtuple commented 3 years ago

Hey @dumblob !

Pipelang stalled out a bit since the Remits project it was being used for was shelved. That said, I absolutely love the idea of Pipelang in isolation and have been planning to get back into it when I had time. There are some features that I want to implement once I figure out the right semantics (sub-pipelines being the main thing...think of them as a pipe version of anonymous functions). After that I think the main work is documentation, examples, fleshing out tests, and maybe defining a preset list of filters that act as an optional standard library.

Do you have a usecase for the language right now or just generally interested? I'd love to hear any ideas (crazy or otherwise)!

dumblob commented 3 years ago

Pipelang stalled out a bit since the Remits project it was being used for was shelved.

Oh, sounds unfortunate. Still this would interest me - what was the "Remits project" about? I'm interested (among other things) in how pipe-inspired languages are being (meant to be) used in the wild.

That said, I absolutely love the idea of Pipelang in isolation and have been planning to get back into it when I had time. There are some features that I want to implement once I figure out the right semantics (sub-pipelines being the main thing...think of them as a pipe version of anonymous functions). After that I think the main work is documentation, examples, fleshing out tests, and maybe defining a preset list of filters that act as an optional standard library.

Sounds like a decent base - but feel free to take a look at some of the crazy ideas below first :wink:.

Do you have a usecase for the language right now or just generally interested?

Actually not (assuming you meant commercial use case). So as of now it's rather the general interest.

I'd love to hear any ideas (crazy or otherwise)!

Let's start with some brainstorming:

  1. create a comprehensive language for data manipulation (I'm deliberately not saying a programming language - i.e. no turing-complete computation per se, but rather focus on data manipulation in time)
  2. feedback loops
  3. splitting & muxing (both with and without breaking the full ordering)
  4. merging & demuxing (both with and without breaking the full ordering)
  5. changing frequency
  6. reshaping data samples
  7. (recursive) windowing/buffering & stepped/window-delimited processing (to break full ordering into partial ordering and back)
  8. synchronous (blocking) joining/merging/demuxing
  9. asynchronous (non-blocking) joining/merging/demuxing
  10. recoding data samples
  11. runtime arbitrary creation & deletion of pipes based on inputs
  12. full AOT type safety (incl. null/none/nil safety)
  13. pipe-based-feedback-loop error handling
  14. push/pull semantics choice anywhere in the graph
  15. pipes in pipes (i.e. recursive embedding of infinite streams)
  16. ...

This has many use cases - from easy data exploration through easy writing of true apps up to having no-effort UIs (implicit insertion of | prettyprint in nushell, https://github.com/calebwin/stdg etc.). And the best of all - it inherently supports manycore parallelism (see https://binpa.sh/docs/tutorial/ )! Yeah - the first ever zero-effort fully parallel data manipulation (programming?) paradigm!

Crazy, isn't it?

badtuple commented 3 years ago

what was the "Remits project" about

Remits was a persisted log abstraction I was working on. The pitch was "Kafka but easier to use and with built in querying". Pipelang was the bones of the query language and allowed you to define data transformations on streams. I will likely return to it someday.

Regarding your ideas, they all sound super cool! However I'm not quite sure they are on the same level of abstraction that Pipelang lives on. I envision things like you described being implemented using Pipelang as a library. Pipelang is meant to be embedded into other programs (like Lua is used.) That way it can be used for many different use cases...user facing querying, a shell like language, internal DSLs, etc.

To support this, I can't have a real "runtime". Instead, I want to expose a good API that automatically handles data flowing through the pipeline, but specifics on when and how that happens should be determined by the host program. Specifically, while Pipelang can have a way to have splits and joins that allow concurrency, I don't want to force the parent program to be multithreaded if they prefer it to be computed serially.

Similarly, Pipelang doesn't (yet?) have a real standard library of filters. And even if it did it would be entirely optional. This is because I expect most filters to be specific to the program they are embedded in, and likely implemented by the users. The exception would be generic useful things like windowing and sampling. Utilities to easily create new filters are included however.

That being said, a runtime like you describe could definitely be implemented as an optional separate library so people can opt into it. I can see many people who would want automatic parallelism for large pipelines...sort of along the same lines as what Rayon provides. Or it could be implemented as a standalone (non-embedded) language interpreter using Pipelang.

dumblob commented 3 years ago

Remits was a persisted log abstraction I was working on. The pitch was "Kafka but easier to use and with built in querying". Pipelang was the bones of the query language and allowed you to define data transformations on streams. I will likely return to it someday.

Thank you! That sounds useful per se. But if there is no (paying :wink:) "customer", then time spent elsewhere is a better idea indeed.

Regarding parallelism I didn't mean it as a requirement. Actually the same semantics can be easily achieved e.g. with some async primitives on a single core. So I consider this to be an implementation detail. In case of a library, this should be "pluggable" based on user choice (in C this would be conditional compilation #define PIPELANG_WITH_TRUE_PARALLELISM ... which would then require e.g. pthreads).

And thanks for clarification of the "level of abstraction" pipelang resides at. Let's see how it'll develop over time.

Btw. Rayon is cool and I've looked at it a year or two ago. But it's actually quite limited in what can & will be parallelized. I think with the functionality I described above pipelang will achieve higher parallelization at lower cost :wink: (this is subjective, but anyway).

dumblob commented 3 years ago
  1. There are no constants - everything is a stream ("my text" | echo would mean create an anonymous stream with one sample of type string and value my text and pipe this stream to echo).