KarypisLab / METIS

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering
Other
665 stars 134 forks source link

Message: "***It seems that Metis did not free all of its memory! Report this." #51

Open cponder opened 1 year ago

cponder commented 1 year ago

I ran this command on a weather grid from NCAR at 5km resolution:

gpmetis -minconn -contig -niter=200 x1.*.graph.info 1792

This is the output it gave:

******************************************************************************
METIS 5.2 Copyright 1998-16, Regents of the University of Minnesota
 (HEAD: , Built on: Nov 30 2022, 10:22:25)
 size of idx_t: 32bits, real_t: 32bits, idx_t *: 64bits

Graph Information -----------------------------------------------------------
 Name: x1.5898242.graph.info, #Vertices: 5898242, #Edges: 17694720, #Parts: 1792

Options ---------------------------------------------------------------------
 ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
 dbglvl=0, ufactor=1.030, no2hop=NO, minconn=YES, contig=YES
 ondisk=NO, nooutput=NO
 seed=-1, niparts--1, niter=200, ncuts=1

Direct k-way Partitioning ---------------------------------------------------
free(): invalid pointer
free(): invalid pointer
***It seems that Metis did not free all of its memory! Report this.

***Metis returned with an error.
Could not find pointer 0x7fad5e41f010 in mcore

Aborted (core dumped)

This is the only case I've seen of gpmetis failing. Does this happen if I exhaust the RAM memory?

cponder commented 1 year ago

This version didn't have the problem: http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz

karypis commented 1 year ago

Can you try to reproduce the error by trying a few different seeds for the random number generator (it is one of the optional command line parameters)?

Can you share the graph?

cponder commented 1 year ago

Here's a data-set where I see the breakage: https://www2.mmm.ucar.edu/projects/mpas/benchmark/v7.0/MPAS-A_benchmark_10km_v7.0.tar.gz

cponder commented 1 year ago

I'm building under Ubuntu 22.04 using the GCC 11.3.0 compilers.

cponder commented 1 year ago

I tried these few, which all failed:

gpmetis -minconn -contig -niter=200 x1.*.graph.info  256
gpmetis -seed 240 -minconn -contig -niter=200 x1.*.graph.info  256
gpmetis -seed 8771326 -minconn -contig -niter=200 x1.*.graph.info  256
gpmetis -seed 38771326 -minconn -contig -niter=200 x1.*.graph.info  256
karypis commented 1 year ago

I tried on a fresh build of Metis on my m1 based macbook pro and everything looks fine.

Screen Shot 2022-12-03 at 6 00 20 PM Screen Shot 2022-12-03 at 5 58 59 PM
cponder commented 1 year ago

I'm running inside a container. Maybe it's causing some interference with the memory-allocation. Are there some print-statements that I can insert, that would explain where the failure is happening?

cponder commented 1 year ago

Here's a discrepancy:

size of idx_t: 32bits, real_t: 32bits, idx_t *: 64bits

When you run it, the idx_t says it's 64bits not 32bits.

cponder commented 1 year ago

Are we running the same source version? I'm using these downloads: https://github.com/KarypisLab/GKlib/archive/refs/tags/METIS-v5.1.1-DistDGL-0.5.tar.gz https://github.com/KarypisLab/METIS/archive/refs/tags/v5.1.1-DistDGL-v0.5.tar.gz though when I run it I see this header

METIS 5.2 Copyright 1998-16, Regents of the University of Minnesota
 (HEAD: , Built on: Dec  4 2022, 13:48:00)

Yours also says METIS 5.2. I assume the second line is when I built the binary, not when the latest source-checkin was made.

karypis commented 1 year ago

I use the latest from the master branch.

cponder commented 1 year ago

Yes this did work. I had to also use the master-branch version of the GKlib. Can you please make new releases of these?

cponder commented 1 year ago

Also, it still says idx_t: 32bits here:

 size of idx_t: 32bits, real_t: 32bits, idx_t *: 64bits

Can you tell me why it's doing this?

karypis commented 1 year ago

My built was using 64 bit for idx_t ('make config i64=1'). It does not make a difference.

cponder commented 1 year ago

Yeah I get that, the idx_t: 32bits ran ok for me too. We're seeing memory failures with a different app, though, and I wanted to make sure there wasn't some issue with our container constricting some intrinsic type to 32-bit.

cponder commented 1 year ago

The latest tag [v5.1.1-DistDGL-v0.5 is from over 2 years ago, and (incorrectly) reads METIS 5,2 when I run it. Given that it doesn't work, either, can you please tag a 5.2 snapshot of the latest?