KarypisLab / METIS

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering
Other
699 stars 138 forks source link

Could not find pointer in mcore crash #88

Closed daniilvinn closed 5 months ago

daniilvinn commented 5 months ago

Problem: Using METIS_PartGraphKway I ran into a problem that led to an application crash with error "Could not find pointer X in mcore" (X is an address). I am not sure if it is the expected behavior for that input data (provided in crash_*.txt). However, exactly the same setup works for other graphs (data provided in correct_*.txt).

Steps to reproduce:

  1. Setup METIS with IDX and REAL type width = 64.
  2. Use these options:
    
    idx_t options[METIS_NOPTIONS];
    METIS_SetDefaultOptions(options);

options[METIS_OPTION_OBJTYPE] = METIS_OBJTYPE_CUT; options[METIS_OPTION_CCORDER] = 1;


3. Run `METIS_PartGraphKway` with `xadj`, `adjncy` and `adjwgt` as specified in `crash_*.txt` files, `nvtxs` = 34, `ncon` = 1, `npart` = 8, `part` = array of `idx_t` with size = 34

**Correct behavior**:
No crash. Run `METIS_PartGraphKway` with `xadj`, `adjncy` and `adjwgt` as specified in `correct_*.txt` files, `nvtxs` = 437, `ncon` = 1, `npart` = 109, `part` = array of `idx_t` with size = 437

**Attached files**:
[crash_adjncy.txt](https://github.com/KarypisLab/METIS/files/15268010/crash_adjncy.txt)
[crash_adjwgt.txt](https://github.com/KarypisLab/METIS/files/15268011/crash_adjwgt.txt)
[crash_xadj.txt](https://github.com/KarypisLab/METIS/files/15268013/crash_xadj.txt)

[correct_adjncy.txt](https://github.com/KarypisLab/METIS/files/15268047/correct_adjncy.txt)
[correct_adjwgt.txt](https://github.com/KarypisLab/METIS/files/15268048/correct_adjwgt.txt)
[correct_xadj.txt](https://github.com/KarypisLab/METIS/files/15268049/correct_xadj.txt)
daniilvinn commented 5 months ago

Resolved: METIS_PartGraphKway was called from multiple threads simultaneously which led to an application crash. Performing a graph cut under a lock resolved the issue.

gfaster commented 5 months ago

Hmm, I don't think that should be an issue. All the global variables should be thread locals.

daniilvinn commented 5 months ago

@gfaster Still, it solved the issue - I don't get crashes anymore. As far as I know, METIS thread local globals are enabled via CMake thread_local support check - maybe it fails for some reason (I will check). Also worth to mention - for threading I use taskflow library, maybe it handles threads in some special way

gfaster commented 5 months ago

Taskflow can operate in a work-stealing mode for asynchronous tasks, but it doesn't seem like that should be an issue since it isn't preemptive.

A few other questions:

If you're using gcc or clang, I strongly suspect the problem is that -pthread isn't being passed to the compiler, which I believe is needed for thread locals to work correctly.