KarypisLab / METIS

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering
Other
699 stars 138 forks source link

Bug in genmmd (multiple minimum external degree) function #46

Closed npotravkin closed 1 year ago

npotravkin commented 2 years ago

Function return wrong ordering on simplest graph. Results depend on input garbage in working parameters.

example:

const idx_t neqns = 2;
idx_t xadj[3]={1,2,3};
idx_t adjncy[2] = {2,1};

idx_t delta = 1;
idx_t maxint = IDX_MAX;

const idx_t nvtxs = 2;
std::unique_ptr<idx_t[]> perm(new idx_t [nvtxs+5]);
std::unique_ptr<idx_t[]> iperm(new idx_t [nvtxs+5]);
std::unique_ptr<idx_t[]> head(new idx_t [nvtxs+5]);
std::unique_ptr<idx_t[]> qsize(new idx_t [nvtxs+5]);
std::unique_ptr<idx_t[]> list(new idx_t [nvtxs+5]);
std::unique_ptr<idx_t[]> marker(new idx_t [nvtxs+5]);
idx_t nofsub;

bool is_bug_produced = true;
if(is_bug_produced){
    for(idx_t i=0;i<nvtxs+5;++i) head[i] = 2;
}else{
    for(idx_t i=0;i<nvtxs+5;++i) head[i] = -2;
}

genmmd(neqns,xadj,adjncy,iperm.get(),perm.get(),delta,head.get(),qsize.get(),list.get(),marker.get(),maxint,&nofsub);
jsf67 commented 1 year ago

I'm pretty sure this is the same bug I'm hitting. In the case I'm testing genmmd was called with a fully connected graph of size 3. The point of failure is line 107: _mdegnode = head[mdeg]; with mdeg=4 even though neqns=3 That sets mdeg to garbage from past the end of the head[] array, so crash or not crash depends on what garbage happens to be past the end of that array, as in the above example. A zero or negative value past the end of head[] is safe, while a positive value may corrupt the output or may corrupt nearby memory.

I think the simple fix would be a line added directly after line 97 to do: if (mdlmt>neqns) mdlmt=neqns;

In case the above is not clear enough, here are more details you might need if you don't understand the operation of genmmd:

head is provided to genmmd as workspace, so depending on previous contents of that space is a bug.

head[1] through head[neqns] are initialized by genmmd and then used by it. positions in head[] beyond head[neqns] are not initialized and using them is a bug.

So the line mdeg_node = head[mdeg]; should never be executed when mdeg>neqns. For most input graphs, mdeg won't get that high before the loop is ended by num>neqns. But for some input graphs (with delta>0) num gets to neqns+1 without the num>neqns check being executed. Then mdeg is incremented to an impossible value and used corrupting the result.

The change I suggested has the minimum impact on performance that I could think of, while forcing a jump to n900 in the case that mdeg is past the end.

Either directly comparing mdeg to neqns or having an extra check for num>neqns would be a more understandible correction, but less efficient and no more effective.

jsf67 commented 1 year ago

I think that as long as delta is 1, code in mmdint before it was changed in 2019-06-03, covered up this bug in genmmd.

So the many users who have not updated to the 2019-06-03 or later version aren't seeing this bug. But once you have the 2019-06-03 correction, this bug becomes very serious and often causes seg faults in typical calling code.

The mmdint change in 2019-06-03 was not inherently wrong. It ought to work either way, but more efficient without that change.

The more serious aspect of the mmdint change is that this bug in genmmd was safe for delta<2 before that change, but now is unsafe for delta=1. The bug in genmmd should be fixed anyway (because delta>1 should be supported). But the change in mmdint also ought to be reversed.

After the 2019-06-03 change, this bug in genmmd is very serious to the point that it is not currently reasonable to use the post 2019-06-03 version.

karypis commented 1 year ago

Thanks for this. See if the fix addresses the issue.

npotravkin commented 1 year ago

Issue is fixed on my example.