Questions regarding Heron

Make42 commented 2 years ago

Wow, Heron seems like a life-long dream come true. If I get the project right, then it's what I am looking for for a couple of years now. Thanks!

I do have a couple of questions after reading https://python.plainenglish.io/heron-a-hybrid-approach-to-data-pipelines-in-python-aa7719fe8f2e

What does the editor do if the proof of life forwarder does not provide proof of life that a worker is alive?
Why do you separate worker and com into two different scripts / processes? Why are they not just one? It seems cumbersome to require the communication between those via data and heartbeat.

The processes start in the order the Nodes have been added to the Editor, so if the order is important add the Nodes as appropriate.

This only refers to the start of the processes - the initialization - not the actual work done on data, right?

4.

If you want to have some of the Nodes running on other machines then...

I understood that workers run on other machines and that nodes are just concepts in the editor. What do you mean by the "nodes running on other machines"?

Edits:

I cannot underemphasize the usefulness of these heron.log files when you are debugging new code that runs on different machines.

You mean "overemphasize", right?

use underscores or camel toe or both

You mean "camel case"... "camel toe" is something VERY different (look it up).

georgedimitriadis commented 2 years ago

Hey,

Thank you for the kind words.

I really hope you find Heron useful and let me know if you have any questions if and when you start using it. I am very happy to help. I am about to submit a paper on it and soon after I will deal with its manual.

Also contributions are welcome :)

To answer your questions:

1) What does the editor do if the proof of life forwarder does not provide proof of life that a worker is alive?

It will not send the parameters of the Node to the worker and will not start the loop that checks if the parameters are updated in the editor to send the updated ones to the worker. But if that happens then the editor will not continue initialising the Nodes after the dead one so the user will have to close down the whole editor. It will be very obvious to the user which worker failed to initialise because the Node will not light up with a white surround (as does a Node when it initialises properly, i.e. when the worker is up and running and can receive the parameters from the editor).

2) Why do you separate worker and com into two different scripts / processes? Why are they not just one? It seems cumbersome to require the communication between those via data and heartbeat.

The heartbeat is a necessary feature of the fact that processes can run on other machines where the editor or any process running on the editor's machine cannot send a kill command to kill them. Even if there was only one process per Node it would still need a heartbeat mechanism but now it would be between the single process and the editor instead of the worker process and the com one. The job of the com process is to run a rather complicated loop that deals with different topics coming in and out of it in order to have a many to many connectivity (i.e. any input or output of a Node can connect to any number of other inputs or outputs). To allow this many to many connectivity and multiple inputs and outputs per Node, I needed a separate process that wouldn't block when the worker took its time to process something and that messages would still be received and delivered to other outputs. I could (maybe) do that with threading inside the worker process but it would became extremely complicated (if possible).

3) This only refers to the start of the processes - the initialization - not the actual work done on data, right?

Yes. Once all the processes are up and running Heron does not have any internal clock to keep any kind of order of message passing. Messages get passed along as they come in / generated. Which means it is up each worker process to keep up with the requirements of the pipeline.

4) I understood that workers run on other machines and that nodes are just concepts in the editor. What do you mean by the "nodes running on other machines"?

Unfortunately I tend to use the work Node rather loosely and inappropriately. In this case I mean a worker script running on another machine. From the point of view of a user though, one tends to think of all the capabilities of the two scripts (com and worker) and the capabilities of the Node class in the editor as the functionality of the conceptualized "Node" that does something and runs on a machine (but you are right, I meant worker).

Thanks for the edits also :)

Make42 commented 2 years ago

First I thought that the com processes run on the same machines as their corresponding workers. But now I infer, that they run on the same machine as the editor. Is my second understanding right? Am I right, that every process, excluding the workers, (so the three forwarders, the editor, and the com processes) run on the editor's machine?

Heron-Repositories / Heron

Questions regarding Heron #1