Open stemangiola opened 3 years ago
As you can see I have few holes in my benchmark
The workflow hangs and does not submit any more jobs, and if I interrupt and start again it hangs on starting workflow
Stefano,
I'm going through your logs now...
Ben
On Sun, Oct 4, 2020 at 11:56 PM Stefano Mangiola notifications@github.com wrote:
As you can see I have few holes in my benchmark
[image: image] https://user-images.githubusercontent.com/7232890/95038823-d6401480-071a-11eb-8a41-694da25d81e7.png
The workflow hangs and does not submit any more jobs, and if I interrupt and start again it hangs on starting workflow
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cooperative-computing-lab/makeflow-examples/issues/38#issuecomment-703382906, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXQMYXP2U2FBWS5LDCBG63SJE7XTANCNFSM4SDRDF3A .
Stefano, which command line are you using to run the workflows?
When you say you are changing parameters, are you also changing cores, memory, etc., or only parameters of your tasks?
Each block of tests depending on what algorithm is tested is run with different resources
here the command
makeflow -T slurm -j 100 --do-not-save-failed-output test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow
Could you send me the log.out file from: makeflow -T slurm -j 100 --do-not-save-failed-output test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow > log.out 2>&1
parsing dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow... local resources: 32 cores, 193277 MB memory, 148722940 MB disk max running remote jobs: 100 max running local jobs: 100 checking dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow for consistency... dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow has 38880 rules. recovering from log file dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow.makeflowlog... checking for old running or failed jobs... checking files for unexpected changes... (use --skip-file-check to skip this step) starting workflow....
and hangs forever
I forgot to add the -dall debug flag, sorry about that:
makeflow -dall -T slurm -j 100 --do-not-save-failed-output test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow > log.out 2>&1
Stefano, could you also send me dev/test_simulation_makeflow_pipeline/makefile_test_simulation.makeflow.batchlog
?
I don't have batchlog. I have rerun the whole workflow. I think one of the issue (non consistent) is that I increased the combination in the makefile after the workflow was completed and some of the new banchmark dies not execute.
It is common to execute the whole workflow and try some some parameter combinations
Stefano, something that just occurred to me. Are you re-running the makeflow in place without a cleaning operation in between? It could be that makeflow is getting confused by a mismatch between the previous execution log and a newly modified makeflow.
Probably it is the case. But does cleaning lead to the deletion of the dependencies that are already completed. Of course if I delete the log everything gets deleted when the makeflow is called again
Yes, they will be deleted. A safer mode of operation in this case is to not modify the original file, but instead write the updates to differently named makeflow files. Then you can execute each update in sequence.
I understand, but this is not always possible in combinatorics scenario.
expand_grid(
slope = c(-2, -1, -.5, .5, 1, 2),
foreign_prop = c(0, 0.5, 0.8),
S = c(30, 60, 90),
which_changing = 1:16,
run = 1:5,
method = c("ARMET", "cibersort", "llsr", "epic")
)
I can add arbitrary parameter space here with no effort. It would be great if makeflow could update the log file with the new dependencies, and just add them to the tree.
Otherwise makeflow would be suitable to only static workflows.
I think that just appending new rules may be workable, with the understanding that removing a rule, or changing a previously executed rule will result in failure. Would that be something helpful to your use case?
Yes. Usually when doing benchmarking we want to increase combinations. We don't need to delete rules as we can ignore already executed dependencies. And we would eliminate rules on another run if needed.
The issue is that if now I add rules to an existing makefile (with log) the only one executing are the new one at the bottom. The new one in the middle are ignored. This mixed behaviour seems more unwanted than designed.
Stefano, thanks for your input! Let me discuss it with the team.
I have a makeflow file with ~17K commands. Some of them at the root of the tree
Are not executed for some reason, while other combination of parameters are. I don't understand why.