Open dumblob opened 3 years ago
Sorry for the delay, i had 11th exams.
not any real roadmap, i am still fixing bugs and issues. if you have any ideas, lets do it
Sorry for the delay, i had 11th exams.
Take your time, no need to hurry.
if you have any ideas, lets do it
I'm afraid my ideas are both sharply incompatible with certain goals of pipescript (judging based on readme and examples) but 100% compatible with certain other goals from the same readme and examples :sweat:.
So I'm not sure how you'll perceive the following largely unexplored ideas:
This has many use cases as you know - from easy data exploration through easy writing of true apps up to having no-effort UIs (implicit insertion of | prettyprint
in nushell, https://github.com/calebwin/stdg etc.). And the best of all - it inherently supports manycore parallelism (see https://binpa.sh/docs/tutorial/ )! Yeah - a (first ever?) low-effort low-code fully parallel data manipulation (programming?) paradigm!
Crazy, isn't it?
I also have some basics of the corresponding syntax, but that's not important compared to having a clear mutual agreement for goals :wink:.
i don't know much about data manipulation ( splitting & muxing these are going over my head ) but i am happy to learn. it will be very good for pipe-script project to have any real goal.
Ok, if you didn't set after any firm goals yet, then I'd be more than happy to provide some input for an interesting technology and also a thesis (I have experience in mentoring university students as an external/non-university person).
What I'm describing is a bit like creating a streaming data computation graph using a simple syntax.
ad 2) I want it to have general computational power. Any signal processing needs feedback, any simulation of digital HW (e.g. circuits from logic gates) needs feedback, etc.
I have some very shallow thoughts of how to handle the complexity this might lead to. So I'm aware of certain edge cases (most of which should be "solved" by supporting both sync & async computation).
ad 3 & 4) Here I'd like to achieve "partial ordering". Imagine you have an expensive computation you want to run on many cores but you have only one input. Then I'd like to have an option to use the input as a common queue for work-stealing - each core would steal the current sample/item (or batch of those for speed) from the queue. Assume that each core does a chain of operations (not only one) and that these operations might again need to "split" across several other cores (i.e. nested "splitting"). Then I'd like to assemble the output from all these cores again to one queue. Here I'd like to have 2 options - either assemble them as they come (in "random" order) or assemble them in the original order (e.g. based on nested tagging done when splitting).
ad 5) I'm imagining this in a discrete domain (i.e. no continuous/smooth function but rather a step function). Thus defining it as a relative measure to other streams (i.e. closer to how network traffic shaping is being done - i.e. by delaying some samples based on given measure to achieve the desired "average" throughput).
ad 6) This is one of many potential (conceptual) solutions to the problem of outputting more than one stream from a command - merge of two streams of 64bit ints could e.g. result in a stream of structures containing two 64bit ints (and padding if needed).
Reshaping would then either add or remove or swap or rename "fields" (actually representing streams at one point in time) in such a structure. Reshaping shall ideally encompass both physical representation (imagine a C struct with fields and padding and alignment) and logical representation (names of fields, their order if any, value constraints over all fields together - please do not confuse with types as types do specify constraints only over each field separately and actually are usually very limited).
For synchronous computation, it's clear how to populate such struct. For async I'd simply take the currently newly computed value for a given field and then took old most recent values of all other fields to populate the struct. But again, this is just a crazy not very thought-through idea (it actually emerged from the need to allow to run all commands in parallel - therefore so much copying).
ad 8 & 9) This probably deserves more explanation due to narrow semantics of this topic in most programming languages. What I mean is "true" synchronicity and asynchronicity as seen in physical world (though in real world there seems to be everything async if not talking about things like quantum entanglement etc., but usually we define things to happen synchronously if the measured difference of absolute time is less than x
- it's easier to understand if the time is discrete (a counter) like in CPUs).
Also this seems to be one of the rather major differences compared to many (all I know so far) "pipe languages".
I imagine that I could specify a set of subgraphs where each subgraph would either await each other to provide one new sample as a result of their computation (this is synchronous) or to not await each other to provide one new sample and just proceed further (this is asynchronous).
I have some very preliminary syntax ideas for this (unrelated to pipe-script and thus incompatible with it). But the main point is the semantics as described.
ad 10) Assuming our language will deliberately not specify how the streams will be implemented on inside, there has to be a way to read (as a source) & write (as a sink) data in a specific format. If we had reshaping as suggested in (6), this wouldn't be a problem as reshaping covers the physical representation (endianess, sizes, paddings, signedness, etc.). But as I wrote, it's just an idea and something else (better) might be used. In which case we'll need an explicit generic way how to e.g. read & write serialized data (e.g. in JSON or BSON or FlatBuffers or any other of those gazillions of existing & future formats).
ad 11) Here I'd actually be fine even with a more constrained variation - like requiring to know in compile time how the to-be-deleted or to-be-added graph will look like and only care about it's (de)instantiation in run time.
ad 13) My idea is to forget completely about "errors being errors" but simply treat any error from a given instance of a command as yet another full-featured output queue (i.e. the command would be seen as multiple-output "subgraph").
ad 14) This is merely a note, that e.g. in case the graph(s) should be deployed as a distributed computing system (cores connected with some type of unreliable network), then there might be the need to specify which of the two commands with pipe in between shall (re)initiate the communication.
ad 15) I didn't think of all the edge cases but my initial idea was rather syntax-oriented than semantics. Because I see "streams of streams" as simply a "trunk" or "set" of streams but emphasizing that a command (disregarding whether source or sink or both (aka filter)) has the liberty to "open" and/or "close" any new or existing stream any time. In other words, this point merely holistically reiterates on many of the other points above :wink:.
ad 16) This "heterogeneous or not" is partially a philosophical question, partially an implementation detail and partially an important decision affecting how the pipe language shall be structured :wink:.
It's just another of the possible solutions to multiple outputs from one command (in addition to the "struct" concept explained in the context of reshaping in (6) ). We could actually put samples representing different outputs into one stream but in a serial manner instead of "next to each other as described in (6)".
This way any command would first need to read the sample and dynamically in run time determine its "type" (incl. length/size, padding, etc.) because generally it wouldn't be easily possible to determine this with some clever analysis up front in compile time. This might have some performance influence - probably less copying than the "struct" approach but also more processing due to a "switch/branch" inside of nearly every single command. Again, this is not thought-through and all rather preliminary...
Any comments, ideas, thoughts very much welcome and appreciated!
My highly preliminary syntax for such a language should be closer to this than to others:
$fizzbuzz { gen (-999 999 1) filter (x >= 1 && x <= 100)
@ branch (x % 3 == 0 && x % 5 == 0) write (stdout FizzBuzz) # FizzBuzz alias got stringified in `write` due to "disguising" as described below
@ else
@ branch (x % 5) write (stdout Buzz)
@ else
@ branch (x % 3) write (stdout Fizz)
@ else trash
} # defines & immediately executes the block with no arguments
fizzbuzz # executes the same block again
{ args filter ... ;; log (howk) } (just three arguments) # define a block which accepts
# some streams as arguments, then execute
# this block with arguments `just` `three` `arguments`
# (note ;; is syntax sugar to delimit independent
# subgraphs executed in parallel)
{ args filter ... } () # this will compile fine
{ args filter ... } # same as previous line (but with syntax sugar allowing to omit the empty list)
Some underlying ideas:
All "commands" are aliases by default (there is no way to express a "real unaliased command"). Most commands should work transparently with streams from real unaliased commands as well as from aliases though. There are special commands working only with aliases and their results (if any) are "injected" into the given lexical scope. Any subgraph containing a command working on aliases is executed in compile time. After that, what's left will be lowered to "real commands" eventually (actually it'll get lowered to a special object with defined "disguising" behavior allowing for write (stdout FizzBuzz)
instead of rigidly requiring e.g. string quoting write (stdout "FizzBuzz")
whereas the lookup for command of the given name has the highest precedence and this lookup can be disabled with escaping - I didn't decide how to escape it but probably with `
or :
or ;
or \
as a prefix) and compiled to a binary.
It is a "free form syntax" (see Arturo, Basil, Red, Rebol, Ren-C, Spry, ...) delimited by ASCII space character thus there are no rules regarding other white space characters, newlines, etc. and there are no true lexical separators/delimiters (though there is one lexical separator ;;
as syntax sugar). So no ;
, no Python-like indentation, nothing.
Pipes between commands are implicit (producer filter consumer
is a pipeline akin to producer | filter | consumer
in POSIX shell). Except for the ( ... )
block which does not chain commands this way but rather plumbs the outputs from each listed command as input to the command right before the opening parenthesis. Note if the listed command is not a command, it'll get converted to an anonymous one based on the lowering ("disguising") rules - thus this language does not have any constants nor variables nor anything similar because everything will end up being a command, i.e. stream of samples).
There are several types of blocks with different meanings:
{ ... }
where all "lose" ends of subgraphs/chains will be "merged" together asynchronously (i.e. an outgoing sample from a subgraph will not wait for a sample from another subgraph from the same block both are lexically situated in but rather immediately outputed as a new sample of the whole async block)< ... >
where all "lose" ends of subgraphs/chains will be "merged" together synchronously (i.e. an outgoing sample from a subgraph will wait for a sample from all other subgraphs from the same block they are situated in and first then the synchronized "multisample" will be outputted as a new sample of the whole sync block)( ... )
as a suffix of any command[ ... ]
(note, this one might change or even get removed...) - useful for specifying properties of the pipe itself (e.g. to get an unordered pipe allowing the parallel scheduler runtime to decide to run many instances of the consumer of the pipe instead of just one instance in case the pipe has to stay ordered)Prefixes in command names designate special treatment:
$
in $fizzbuzz
means "also give label fizzbuzz
to the following async/sync block" to allow its reuse by refering to it by label#{ ... }
or #[ ... ]
or #( ... )
or #< ... >
(or escape character from (1) followed by any of these block delimiters) block as "anonymous block" (with recursion support) serving as multiline comment and thanks to $mylabel
syntax referencable as a stream of one sample (thus allowing to evade escaping of complex content). Note this one might change or be removed in the future (if a better solution will be found). I'm considering also supporting multiple such "data blocks" for data - e.g. string - interpolation instead of baking this into strings (delimited by double and single quotes with the same meaning: "mystr"
or 'mystr'
).@myid
is an optional suffix of explicit pipes denoting the instance of the given pipe (note a pipe here is MPMC /multiple producer multiple consumer/ queue under the hood). If not used as suffix but standalone, it refers to the instance of the given pipe (useful for plumbing additional outputs to it or reading from it from other places - note this one might change or even get removed).
@@
as standalone token is kind of a special case of (6) denoting an "anonymous instance of the pipe" serving as an anchor denoting a place of N-ary duplication of the sample (aka "stream splitting with copy semantics"). This pipe instance can be referenced by a single standalone @
character only from the same lexical scope (not from any higher nor deeper).
E.g. producer @@ @ branch (x ;< 5) trash @ else trash
(here ;
is the escape character as discussed in (1) because I find it slightly less visually distracting than e.g. \
).
Note @@
can be left out as syntax sugar if there is no ambiguity: producer @ branch (x ;< 5) trash @ else trash
.
Muxing is thus achieved through the combination of @
and branch
command (which has a mandatory counterpart command else
to make each branching exhaustive).
Instead of specifying or defining the structure of the samples (which is expected to be very hairy) everywhere, there will only be a concept of "refering to equality or similarity" - i.e. "my input is like output of anotherBlockLabel" instead of "my input has this extremely hairy long definition structure with all details like padding alignment etc". Refering to other commands like this shall support also composition and removal of arbitrary parts. This all will probably require having some dummy built-in commands offering inputs/outputs just for the sake of being able to refer to them, but I'm convinced it's worth it.
Lexical scoping works from bottom to top and if not found from top to bottom. The lexically closest has precedence. Enforcing is possible with fully qualified path denoting all lexical scopes from top to bottom (the "path" delimiter could be /
or :
or such).
Importing a library should be just another command working on aliases. Note, importing shall be well thought through because everything gets executed by default, so importing a "full library" would mean executing every single subgraph in the library which is presumably not what we want.
There are many (10+ as of now) other unresolved and problematic topics & cases not covered in this crash course. I didn't want to pollute this discussion, so I left them out for now.
I know there are numerous inconsistencies and open questions in this syntax, so feel free to speak up (and ideally offer potential solutions :wink:). Key takeaways: everything is a stream (even if many have only one indefinitely repeating sample), everything is inherently parallel, there is a separate compile-time evaluation of subgraphs and separate run-time evaluation of subgraphs, cycles in subgraphs are allowed (necessary).
Let me know what you think, I'm all ears!
future goals for now then
did i get it right? sorry that you have to explain topics many times
What I described is a standalone new language unrelated to your current pipe-script
. I'm not familiar enough with your pipe-script
to assess whether what I'm describing is applicable to your design (actually I'd even dare to guess it's not applicable :cry:).
So the question rather is whether you want to just take some of the ideas and implement them in pipe-script
as a new feature. Or if you say "oh, my pipe-script
is not well designed for these ideas, so I'll rather start from scratch and maybe even give a shot to the syntax you @dumblob proposed" :wink:.
Which of these 2 option would you choose?
yeah you are right pipe-script is not designed for this, lets start form scratch. 👍
Ok, will you create a new repository so that I can subscribe and closely follow your development?
here you go
I now wanted to showcase the discussion we had in the new repo to someone but found out it doesn't exist any more. Did you make the repo non-public or delete it completely? Do you have any archive of the discussion of ours?
sorry, but i have deleted the repo.
That's a pity - some of your and mine work is gone. So let's take this as a life lesson - never remove data and instead only archive them or hide them.
I wonder what the future and the roadmap looks like?
I have also some (very) crazy ideas but I'd first like to learn about the goals and roadmap if any :wink:.