Closed fnothaft closed 7 years ago
Jenkins, test this please.
@fnothaft — Thanks a ton man!
Hmmm, even with the bump to 3.6.0 we're still having the deadlock issue.
JFC! It was just that the timeout parameter needed to be increased... I'll clean up the debugging commits tomorrow, and then I think this is good to go.
This is good to go from my side.
Thank you, @jvivian. I have taken the liberty of preparing remarks: if I may riff off of Newton, "If I have seen further, it is by using a more verbose level of logging."
I generally prefer pinning to a stable release.
Hi @cket! I just upgraded to the latest toil in PyPi (3.7.0a1.dev390) and am getting a runtime error with the nested services now:
RuntimeError: (RuntimeError("This job was passed a promise that wasn't yet resolved when it ran. The job 'SparkService' that fulfills this promise hasn't yet finished. This means that there aren't enough constraints to ensure the current job always runs after 'SparkService'. Consider adding a follow-on indirection between this job and its parent, or adding this job as a child/follow-on of 'SparkService'.",
I'm pretty sure that this is coming from the masterIP from the nested service job that is getting passed as a promise. I'm going to look and see if 1. the approach for using nested services changed or 2. if I can find the commit that broke this upstream in toil. I'll let you know. This should reproduce with the code that I have pushed, if you want to take a look on your end.
It seems like this issue came in somewhere between 3.7.0a1.dev389
and 3.7.0a1.dev390
.
@fnothaft that's not a good sign. If you don't find an easy solution feel free to open a ticket for this against Toil!
Actually, scratch that. It turns out it was an issue with my environment; not a bug in toil. Nevermind!
Great, glad the issue is resolved!
This is good to merge from my side, @jvivian @cket
This is just #78, but rebased to resolve the conflicts.