Closed garlick closed 6 months ago
I think the problem is likely here:
If a jobspec does not have a duration set, then the duration is set to the whole graph duration instead of the remaining time. A possible fix, though I don't know what I'm doing:
diff --git a/resource/traversers/dfu_impl.hpp b/resource/traversers/dfu_impl.hpp
index 23048c01..7ef64948 100644
--- a/resource/traversers/dfu_impl.hpp
+++ b/resource/traversers/dfu_impl.hpp
@@ -70,8 +70,9 @@ struct jobmeta_t {
now = t;
jobid = id;
alloc_type = alloc;
+ const auto now = std::chrono::system_clock::now();
int64_t g_duration = std::chrono::duration_cast<std::chrono::seconds>
- (graph_duration.graph_end - graph_duration.graph_start).count ();
+ (graph_duration.graph_end - now).count ();
if (g_duration <= 0) {
errno = EINVAL;
Problem: when a job is submitted with unlimited duration, fluxion assigns it an expiration time that is (start time + instance duration), which does not account for the time elapsed between instance start and job start. As a result, the job's expiration time is after the instance is no longer running.