Closed davibicudo closed 2 years ago
Hi Davi, you mean "knowing" what is executed and with which parameters while the pipeline is running, or before? @ctchervenkov has developed some tools that are already included which output some data on the structure of the pipeline with which you then later can make a graph (plot) of the stages and their dependencies. But parameters are not included yet, I think. Basically, this mode performs a "simulation" of the pipeline and writes out all the steps. If this is what you're looking for, you can maybe discuss a bit with @ctchervenkov as well.
If you're thinking about "online" information while the pipeline is running, this is also an interesting topic, but so far there is no development in that direction. I could imagine something like running a server in the background while running the pipeline that can be accessed via the web browser, which then shows which stages have been executed, are currently running and will be executed later on, the progress of the pipeline etc. Is this something you're looking for? I'm not sure if I have the time to actually do the implementation, but I'd be happy to guide you more in detail, and @ctchervenkov would probably be interested as well, I guess.
Of course, to keep it simple, you could also just write more information on the command line. Have a look at
That is where the "Executing stage XXXX" message is produced. You have access to the stage
variable afterwards, which contains all kind of information about the stage. This could be a starting point if you want to make the output a bit more verbose. Would be nice to have!
Hi Sebastian, Thanks for the ideas and direction forward! I tested the graph and it looks nice, but when running stages multiple times with different parameters the stages are apparently overwritten. A server would definitely be cool! We work also with luigi here for another purpose and it is quite nice and helpful. Going in that direction however would probably require some more time and effort. For now a simple solution would be enough. Thanks for pointing where to start to improve verbosity. I'll make an attempt here and if it works well eventually push a PR.
Great, looking forward!
Sorry for not giving an update in this issue anymore, I couldn't find a very good solution and in a lack of more time eventually switched to a local solution, which I think wasn't too bad but maybe wouldn't work directly inside of synpp. We also refactored the code to reduce the total number of stages and make them more clear. But part of the solution here was to create a wrapper class called StageOutput(contents) which did the logging I needed it to do upon creation. From my side we could close the issue, unless you want to keep it open for the other idea, of having all stages in the flowchart even if they have the same descriptor but varying config parameters.
Hi Sebastian,
I had a good experience with this pipeline code at ETH and then happy to later see that you published it in a separate repository. We are using it here for setting up a simple data analysis pipeline (nothing to do with Synpop^^), and one thing I missed when debugging was to know what stage is being executed/loaded at what time and with which parameters. Currently we add a logging message inside execute() but I believe this could be easily and better done in the library code itself. Do you agree? If you can add this feature or point me where to add it would be great!
Best, Davi