Closed cmisale closed 4 years ago
@cmisale: I will take a look at this on my returning flight. It may have something to do with the granularity of memory pool modeling (or something else of course).
BTW, thank you so much for reporting these problems. Real use cases like this is the only way we can harden our software, @cmisale!
Thanks! Please let me know if you need some other data.
Claudia Misale, PhD Research Staff Member Cognitive and Cloud Solutions Data Centric Systems IBM T. J. Watson Research Center
E-mail: c.misale@ibm.com Phone: +1 (914) 945-1693
From: "Dong H. Ahn" notifications@github.com To: flux-framework/flux-sched flux-sched@noreply.github.com Cc: Claudia Misale c.misale@ibm.com, Mention mention@noreply.github.com Date: 08/15/2019 12:42 PM Subject: [EXTERNAL] Re: [flux-framework/flux-sched] Segfault in edge access (#509)
@cmisale: I will take a look at this on my returning flight. It may have something to do with the granularity of memory pool modeling (or something else of course).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@cmisale:
I think I understand the problem. This is an outstraight bug in handling slot
type in a jobspec.
We treat slot
in a different way than any other resource type because it is the only non-physical resource type. How we handle is this is : 1) perform a DFV subtree walk for a slot type, 2) divide the resource set discovered from this DFV walk equally according to the slot shape, 3) check if the given number of slot
s can be satisfied. There turns out to be a bug in 2).
For example, if your jobspec is socket[1]->slot[1]->core[2]
and your machine is socket[1]->core[8]
, then at the end of the DFV walk on the subtree rooted at a socket vertex, we will give you core: 8
. Then, we divide 8 by 2 which means that we can have 4 slots. So clearly this satisfies slot[2]
.
Now, there is an issue in the current code. The code doesn't factor into account the granularity in which a resource pool vertex is modeled/constructed for this.
Say, each memory pool granularity is 64GB and your machine has 4 of those. In terms of quantity, your machines has socket[1]->memory[256]
. But in terms of scheduleable units you only have 4 (i.e., 4 x 64GB). Now if you jobspec is socket[1]->memory[32]
, we will currently say you have 8 slots
when in fact you only can fit 4 slots
because of this granularity constraint! This leads to a buffer overflow.
I have a patch to fix this below, which I haven't tested throughly but you are welcome to try.
BTW, I noticed that the exampleSierra
GRUG models that each socket has only two 64GB memory pool vertices while it has 22 core vertices. So in this case, your above jobspec will be constrained by memory
rather than core
.
With this granularity, the jobspec will matched with socket[1]->core[1] and ->memory[64]
, which will let you only schedule 4 jobs per node. I don't know if this is what you wanted, but I thought I should point out.
diff --git a/resource/evaluators/edge_eval_api.cpp b/resource/evaluators/edge_eval_api.cpp
index daacc68..4732fb3 100644
--- a/resource/evaluators/edge_eval_api.cpp
+++ b/resource/evaluators/edge_eval_api.cpp
@@ -149,6 +149,11 @@ unsigned int evals_t::qualified_count () const
return m_qual_count;
}
+unsigned int evals_t::qualified_granules () const
+{
+ return m_eval_egroups.size ();
+}
+
unsigned int evals_t::total_count () const
{
return m_total_count;
diff --git a/resource/evaluators/edge_eval_api.hpp b/resource/evaluators/edge_eval_api.hpp
index f4b3914..b0a28ec 100644
--- a/resource/evaluators/edge_eval_api.hpp
+++ b/resource/evaluators/edge_eval_api.hpp
@@ -73,6 +73,7 @@ public:
// This can throw out_of_range exception
const eval_egroup_t &at (unsigned int i) const;
unsigned int qualified_count () const;
+ unsigned int qualified_granules () const;
unsigned int total_count () const;
int64_t cutline () const;
int64_t set_cutline (int64_t cutline);
diff --git a/resource/evaluators/scoring_api.cpp b/resource/evaluators/scoring_api.cpp
index e04ce62..019480e 100644
--- a/resource/evaluators/scoring_api.cpp
+++ b/resource/evaluators/scoring_api.cpp
@@ -180,6 +180,14 @@ unsigned int scoring_api_t::qualified_count (const subsystem_t &s,
return res_evals->qualified_count ();
}
+unsigned int scoring_api_t::qualified_granules (const subsystem_t &s,
+ const std::string &r)
+{
+ handle_new_keys (s, r);
+ auto res_evals = (*m_ssys_map[s])[r];
+ return res_evals->qualified_granules ();
+}
+
unsigned int scoring_api_t::total_count (const subsystem_t &s,
const std::string &r)
{
diff --git a/resource/evaluators/scoring_api.hpp b/resource/evaluators/scoring_api.hpp
index 2689c8d..34113db 100644
--- a/resource/evaluators/scoring_api.hpp
+++ b/resource/evaluators/scoring_api.hpp
@@ -58,6 +58,7 @@ public:
const eval_egroup_t &at (const subsystem_t &s, const std::string &r,
unsigned int i);
unsigned int qualified_count (const subsystem_t &s, const std::string &r);
+ unsigned int qualified_granules (const subsystem_t &s, const std::string &r);
unsigned int total_count (const subsystem_t &s, const std::string &r);
unsigned int best_k (const subsystem_t &s, const std::string &r);
unsigned int best_i (const subsystem_t &s, const std::string &r);
diff --git a/resource/traversers/dfu_impl.cpp b/resource/traversers/dfu_impl.cpp
index 8b8776a..65d3578 100644
--- a/resource/traversers/dfu_impl.cpp
+++ b/resource/traversers/dfu_impl.cpp
@@ -451,17 +451,29 @@ int dfu_impl_t::cnt_slot (const vector<Resource> &slot_shape,
scoring_api_t &dfu_slot)
{
unsigned int qc = 0;
+ unsigned int qg = 0;
unsigned int fit = 0;
unsigned int count = 0;
unsigned int qual_num_slots = UINT_MAX;
const subsystem_t &dom = m_match->dom_subsystem ();
// qualifed slot count is determined by the most constrained resource type
+ // both in terms of the amounts available as well as the number of edges into
+ // that resource because that represent the match granularity.
+ // Say, you have 128 units of memory available across two memory resource
+ // vertices each with 64 units of memory and you request 1 unit of memory.
+ // In this case, you don't have 128 slots available because the match
+ // granularity is 64 units. Instead, you have only 2 slots available each
+ // with 64 units, and your request will get 1 whole resource vertex.
qual_num_slots = UINT_MAX;
for (auto &slot_elem : slot_shape) {
qc = dfu_slot.qualified_count (dom, slot_elem.type);
+ qg = dfu_slot.qualified_granules (dom, slot_elem.type);
count = m_match->calc_count (slot_elem, qc);
+ // constraint check against qualified amounts
fit = (count == 0)? count : (qc / count);
+ // constraint check against qualified granules
+ fit = (fit > qg)? qg : fit;
qual_num_slots = (qual_num_slots > fit)? fit : qual_num_slots;
dfu_slot.rewind_iter_cur (dom, slot_elem.type);
}
@dongahn you nailed it! I have incorporated the changes and it is working good. I didn't get any segmentation fault so far. I am doing now a deeper testing to confirm that. Thanks
@cmisale: Great! Yeah let me know if you see any other problem, this is really useful for me given the mature of this software.
No more errors, we can close it
The patch hasn't made the master but I will make sure to create a PR soonish.
@dongahn: @cmisale noted that the changes described here haven't been integrated into master yet, so I'm reopening this issue so we don't forget.
I'm re-closing this issue since the fix was integrated in merged PR #548.
The segmentation fault is coming up when running
match
on any grug file with a "big" amount of memory (tried with 64 and 128 GB). Line https://github.com/flux-framework/flux-sched/blob/59c2417112f4c6406efdfaad72309626fc790f23/resource/traversers/dfu_impl.cpp#L496-L498is issuing the segfault because of incorrect
(*egroup_i).edges[0].edge
value. I've been printing out thoseedge
values right before lines reported, trying withtiny.grug
andmedium-LOD
in https://github.com/flux-framework/flux-sched/issues/507#issuecomment-516941486.tiny.grug
output for edges is the followingwhich terminates correctly. When running with
medium-LOD
, or also using other grug files likesierra
in the examples folder, the edges look like thisI'm reporting some traces from both valgrind and gdb, they're basically the same. The segfault is raised when trying to print the edge.
Output from
valgrind --tool=memcheck
:Output from
gdb
:Expanding frames:
I am running the following jobspec in both cases:
@dongahn @SteVwonder