KarypisLab / METIS

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering
Other
665 stars 134 forks source link

METIS exit with an incomplete partitioning without raising errors #91

Open ThibaultGH opened 2 months ago

ThibaultGH commented 2 months ago

Hello there,

I'm using METIS with Conan from the Conan Center, so it's METIS 5.2.1.

I have a mesh that has 297 elements, so those are the 297 nodes of the graph and I want to partition it into 64 parts for distributing the workload on 64 different MPI processes and more (for scalability study purpose). I was expecting METIS to be able to partition my graph into something like 41 partitions with 5 nodes on it and 64-41=23 partitions with 4 nodes on it (since 297 = 541 + 4 (65-41)). But instead, I see several messages in my terminal saying :

***Cannot bisect a graph with 0 vertices!
***You are trying to partition a graph into too many parts!

From what I understand looking at METIS code, during the initial partitioning phase, a recursive bisection algorithm is performed and during this the recursive calls end up with calling a bisection on a subgraph that has 0 nodes. And from Multilevel k-way Partitioning Scheme for Irregular Graphs - Karypis, Kumar : "The k-way partitioning problem is most frequently solved by recursive bisection. ... There are a number of advantages of computing the k-way partitioning directly (rather than computing it successively via recursive bisection). First, the entire graph now needs to be coarsened only once, reducing the complexity of this phase to O(|E|) down from O(|E| log k). Second, it is well known that recursive bisection can do arbitrarily worse than k-way partitioning [27]. Thus, a method that obtains a k-way partitioning directly can potentially produce much better partitionings. ... In our algorithm, the k-way partitioning of Gm is computed using our multilevel bisection algorithm [16]. Our experience has shown that our multilevel recursive bisection algorithm produces good initial partitionings and requires a relatively small amount of time as long as the size of the original graph is sufficiently larger than k."

So I guess, METIS does what the paper says, but I feel like there should be a way of going around this recursive bisection during the initialisation phase ? To handle more cases of partitioning ? To do this "direct k-way partitioning" ? Is there ? And more, I feel like METIS exiting the calls to METIS_PartGraphKWay without raising an error saying that the partitioning is incomplete and should not be used as such is dangerous, so I decided to report it in an issue here.

Here is a reproducible test of this example :

#include<vector>
#include "metis.h"

int main(int* argc, char** argv) {
size_t nME = 297;
std::vector<idx_t> xadj = {0, 3, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 49, 52, 56, 60, 65, 70, 75, 80, 85, 90, 94, 98, 102, 106, 111, 116, 121, 126, 131, 136, 140, 144, 148, 152, 157, 162, 167, 172, 177, 182, 187, 192, 197, 202, 206, 210, 214, 218, 223, 228, 233, 238, 243, 248, 252, 256, 260, 264, 269, 274, 279, 284, 289, 294, 298, 302, 306, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 364, 368, 372, 377, 382, 388, 393, 399, 404, 410, 415, 421, 426, 432, 436, 441, 445, 449, 454, 459, 464, 469, 474, 479, 484, 489, 494, 499, 503, 507, 511, 516, 521, 527, 532, 538, 543, 549, 554, 560, 565, 571, 575, 580, 583, 586, 590, 594, 598, 602, 606, 610, 614, 618, 622, 626, 629, 632, 635, 639, 643, 647, 651, 655, 658, 661, 665, 669, 673, 677, 681, 684, 687, 691, 695, 699, 703, 707, 710, 713, 717, 721, 725, 729, 733, 736, 739, 743, 747, 751, 755, 759, 762, 765, 769, 773, 777, 781, 785, 788, 791, 795, 799, 803, 807, 811, 814, 817, 821, 825, 829, 833, 837, 840, 843, 847, 851, 855, 859, 863, 866, 871, 876, 881, 886, 891, 896, 901, 906, 911, 916, 921, 926, 931, 936, 941, 946, 951, 956, 961, 966, 971, 976, 981, 986, 991, 996, 1001, 1006, 1011, 1016, 1021, 1026, 1031, 1036, 1041, 1046, 1051, 1056, 1061, 1066, 1071, 1076, 1081, 1086, 1091, 1096, 1101, 1106, 1111, 1116, 1121, 1126, 1131, 1136, 1141, 1146, 1150, 1154, 1158, 1162, 1166, 1170, 1174, 1178, 1182, 1186, 1190, 1194, 1199, 1204, 1209, 1214, 1219, 1224, 1229, 1234, 1238, 1242, 1246, 1250, 1254, 1258, 1262, 1266, 1270, 1274, 1278, 1282, 1287, 1292, 1297, 1302, 1307, 1312, 1317, 1322};
std::vector<idx_t> adjncy = {1, 2, 14, 0, 3, 15, 0, 3, 4, 289, 1, 2, 5, 290, 2, 5, 6, 273, 3, 4, 7, 274, 4, 7, 8, 16, 5, 6, 9, 17, 6, 9, 10, 18, 7, 8, 11, 19, 8, 11, 12, 20, 9, 10, 13, 21, 10, 13, 22, 11, 12, 23, 0, 15, 291, 24, 1, 14, 292, 25, 6, 17, 275, 18, 26, 7, 16, 276, 19, 27, 8, 16, 19, 20, 28, 9, 17, 18, 21, 29, 10, 18, 21, 22, 30, 11, 19, 20, 23, 31, 12, 20, 23, 32, 13, 21, 22, 33, 14, 25, 295, 34, 15, 24, 296, 35, 16, 27, 271, 28, 40, 17, 26, 272, 29, 41, 18, 26, 29, 30, 42, 19, 27, 28, 31, 43, 20, 28, 31, 32, 44, 21, 29, 30, 33, 45, 22, 30, 33, 46, 23, 31, 32, 47, 24, 35, 36, 48, 25, 34, 37, 49, 34, 37, 38, 293, 50, 35, 36, 39, 294, 51, 36, 39, 40, 269, 52, 37, 38, 41, 270, 53, 26, 38, 41, 42, 54, 27, 39, 40, 43, 55, 28, 40, 43, 44, 249, 29, 41, 42, 45, 250, 30, 42, 45, 46, 225, 31, 43, 44, 47, 226, 32, 44, 47, 56, 33, 45, 46, 57, 34, 49, 50, 58, 35, 48, 51, 59, 36, 48, 51, 52, 60, 37, 49, 50, 53, 61, 38, 50, 53, 54, 62, 39, 51, 52, 55, 63, 40, 52, 55, 251, 64, 41, 53, 54, 252, 65, 46, 57, 227, 66, 47, 56, 228, 67, 48, 59, 60, 68, 49, 58, 61, 69, 50, 58, 61, 62, 70, 51, 59, 60, 63, 71, 52, 60, 63, 64, 72, 53, 61, 62, 65, 73, 54, 62, 65, 255, 74, 55, 63, 64, 256, 75, 56, 67, 223, 80, 57, 66, 224, 81, 58, 69, 70, 82, 59, 68, 71, 83, 60, 68, 71, 72, 84, 61, 69, 70, 73, 85, 62, 70, 73, 74, 86, 63, 71, 72, 75, 87, 64, 72, 75, 76, 88, 65, 73, 74, 77, 89, 74, 77, 78, 253, 90, 75, 76, 79, 254, 91, 76, 79, 80, 221, 92, 77, 78, 81, 222, 93, 66, 78, 81, 94, 67, 79, 80, 95, 68, 83, 84, 96, 69, 82, 138, 85, 97, 70, 82, 85, 86, 98, 71, 83, 84, 139, 87, 99, 72, 84, 87, 88, 100, 73, 85, 86, 140, 89, 101, 74, 86, 89, 90, 102, 75, 87, 88, 141, 91, 103, 76, 88, 91, 92, 104, 77, 89, 90, 142, 93, 105, 78, 90, 93, 94, 106, 79, 91, 92, 143, 95, 107, 80, 92, 95, 108, 81, 93, 94, 144, 109, 82, 97, 98, 110, 83, 96, 99, 111, 84, 96, 99, 100, 112, 85, 97, 98, 101, 113, 86, 98, 101, 102, 114, 87, 99, 100, 103, 115, 88, 100, 103, 104, 116, 89, 101, 102, 105, 117, 90, 102, 105, 106, 118, 91, 103, 104, 107, 119, 92, 104, 107, 108, 120, 93, 105, 106, 109, 121, 94, 106, 109, 122, 95, 107, 108, 123, 96, 111, 112, 124, 97, 110, 194, 113, 125, 98, 110, 113, 114, 126, 99, 111, 112, 195, 115, 127, 100, 112, 115, 116, 128, 101, 113, 114, 196, 117, 129, 102, 114, 117, 118, 130, 103, 115, 116, 197, 119, 131, 104, 116, 119, 120, 132, 105, 117, 118, 198, 121, 133, 106, 118, 121, 122, 134, 107, 119, 120, 199, 123, 135, 108, 120, 123, 136, 109, 121, 122, 200, 137, 110, 125, 126, 111, 124, 127, 112, 124, 127, 128, 113, 125, 126, 129, 114, 126, 129, 130, 115, 127, 128, 131, 116, 128, 131, 132, 117, 129, 130, 133, 118, 130, 133, 134, 119, 131, 132, 135, 120, 132, 135, 136, 121, 133, 134, 137, 122, 134, 137, 123, 135, 136, 83, 139, 145, 85, 138, 140, 146, 87, 139, 141, 147, 89, 140, 142, 148, 91, 141, 143, 149, 93, 142, 144, 150, 95, 143, 151, 138, 146, 152, 139, 145, 147, 153, 140, 146, 148, 154, 141, 147, 149, 155, 142, 148, 150, 156, 143, 149, 151, 157, 144, 150, 158, 145, 153, 159, 146, 152, 154, 160, 147, 153, 155, 161, 148, 154, 156, 162, 149, 155, 157, 163, 150, 156, 158, 164, 151, 157, 165, 152, 160, 166, 153, 159, 161, 167, 154, 160, 162, 168, 155, 161, 163, 169, 156, 162, 164, 170, 157, 163, 165, 171, 158, 164, 172, 159, 167, 173, 160, 166, 168, 174, 161, 167, 169, 175, 162, 168, 170, 176, 163, 169, 171, 177, 164, 170, 172, 178, 165, 171, 179, 166, 174, 180, 167, 173, 175, 181, 168, 174, 176, 182, 169, 175, 177, 183, 170, 176, 178, 184, 171, 177, 179, 185, 172, 178, 186, 173, 181, 187, 174, 180, 182, 188, 175, 181, 183, 189, 176, 182, 184, 190, 177, 183, 185, 191, 178, 184, 186, 192, 179, 185, 193, 180, 188, 194, 181, 187, 189, 195, 182, 188, 190, 196, 183, 189, 191, 197, 184, 190, 192, 198, 185, 191, 193, 199, 186, 192, 200, 111, 187, 195, 113, 188, 194, 196, 115, 189, 195, 197, 117, 190, 196, 198, 119, 191, 197, 199, 121, 192, 198, 200, 123, 193, 199, 202, 203, 211, 231, 207, 201, 204, 212, 232, 208, 201, 204, 229, 205, 237, 202, 203, 230, 206, 238, 203, 206, 237, 207, 213, 204, 205, 238, 208, 214, 201, 205, 208, 211, 215, 202, 206, 207, 212, 216, 210, 211, 233, 231, 219, 209, 212, 234, 232, 220, 201, 207, 209, 212, 217, 202, 208, 210, 211, 218, 205, 214, 247, 215, 221, 206, 213, 248, 216, 222, 207, 213, 216, 217, 223, 208, 214, 215, 218, 224, 211, 215, 218, 219, 227, 212, 216, 217, 220, 228, 209, 217, 220, 241, 225, 210, 218, 219, 242, 226, 78, 213, 222, 253, 223, 79, 214, 221, 254, 224, 66, 215, 221, 224, 227, 67, 216, 222, 223, 228, 44, 219, 226, 227, 249, 45, 220, 225, 228, 250, 56, 217, 223, 225, 228, 57, 218, 224, 226, 227, 203, 230, 231, 239, 235, 204, 229, 232, 240, 236, 201, 209, 229, 232, 233, 202, 210, 230, 231, 234, 209, 231, 234, 235, 241, 210, 232, 233, 236, 242, 229, 233, 236, 239, 243, 230, 234, 235, 240, 244, 203, 205, 238, 239, 247, 204, 206, 237, 240, 248, 229, 235, 237, 240, 245, 230, 236, 238, 239, 246, 219, 233, 242, 243, 249, 220, 234, 241, 244, 250, 235, 241, 244, 245, 251, 236, 242, 243, 246, 252, 239, 243, 246, 247, 255, 240, 244, 245, 248, 256, 213, 237, 245, 248, 253, 214, 238, 246, 247, 254, 42, 225, 241, 250, 251, 43, 226, 242, 249, 252, 54, 243, 249, 252, 255, 55, 244, 250, 251, 256, 76, 221, 247, 254, 255, 77, 222, 248, 253, 256, 64, 245, 251, 253, 256, 65, 246, 252, 254, 255, 259, 267, 279, 263, 260, 268, 280, 264, 257, 277, 261, 285, 258, 278, 262, 286, 259, 285, 263, 269, 260, 286, 264, 270, 257, 261, 267, 271, 258, 262, 268, 272, 267, 281, 279, 273, 268, 282, 280, 274, 257, 263, 265, 275, 258, 264, 266, 276, 38, 261, 270, 293, 271, 39, 262, 269, 294, 272, 26, 263, 269, 272, 275, 27, 264, 270, 271, 276, 4, 265, 274, 275, 289, 5, 266, 273, 276, 290, 16, 267, 271, 273, 276, 17, 268, 272, 274, 275, 259, 279, 287, 283, 260, 280, 288, 284, 257, 265, 277, 281, 258, 266, 278, 282, 265, 279, 283, 289, 266, 280, 284, 290, 277, 281, 287, 291, 278, 282, 288, 292, 259, 261, 287, 293, 260, 262, 288, 294, 277, 283, 285, 295, 278, 284, 286, 296, 2, 273, 281, 290, 291, 3, 274, 282, 289, 292, 14, 283, 289, 292, 295, 15, 284, 290, 291, 296, 36, 269, 285, 294, 295, 37, 270, 286, 293, 296, 24, 287, 291, 293, 296, 25, 288, 292, 294, 295};
std::vector<idx_t> weight = {254100, 326700, 23100, 29700, 23100, 29700, 231000, 297000, 13860, 17820, 13860, 17820, 565950, 727650, 23100, 29700, 21000, 27000, 1260, 1620, 1260, 1620, 51450, 66150, 23100, 29700, 21000, 27000, 1260, 1620, 1260, 1620, 51450, 66150, 254100, 326700, 23100, 29700, 23100, 29700, 231000, 297000, 13860, 17820, 13860, 17820, 565950, 727650, 13860, 17820, 1260, 1620, 1260, 1620, 12600, 16200, 30870, 39690, 13860, 17820, 1260, 1620, 1260, 1620, 12600, 16200, 30870, 39690, 127050, 163350, 11550, 14850, 11550, 14850, 115500, 148500, 6930, 8910, 6930, 8910, 282975, 363825, 57750, 74250, 5250, 6750, 5250, 6750, 52500, 67500, 3150, 4050, 3150, 4050, 128625, 165375, 196350, 252450, 17850, 22950, 17850, 22950, 178500, 229500, 10710, 13770, 10710, 13770, 437325, 562275, 57750, 74250, 5250, 6750, 5250, 6750, 52500, 67500, 3150, 4050, 3150, 4050, 128625, 165375, 92400, 118800, 8400, 10800, 8400, 10800, 84000, 108000, 5040, 6480, 5040, 6480, 205800, 264600, 99000, 9000, 9000, 90000, 5400, 5400, 220500, 23760, 2160, 2160, 21600, 1296, 1296, 52920, 118800, 10800, 10800, 108000, 6480, 6480, 264600, 23760, 2160, 2160, 21600, 1296, 1296, 52920, 118800, 10800, 10800, 108000, 6480, 6480, 264600, 23760, 2160, 2160, 21600, 1296, 1296, 52920, 118800, 10800, 10800, 108000, 6480, 6480, 264600, 23760, 2160, 2160, 21600, 1296, 1296, 52920, 99000, 9000, 9000, 90000, 5400, 5400, 220500, 756, 972, 756, 972, 630, 810, 630, 810, 630, 810, 630, 810, 378, 486, 378, 486, 378, 486, 378, 486, 630, 810, 630, 810, 630, 810, 630, 810, 756, 972, 756, 972, 630, 810, 630, 810, 630, 810, 630, 810, 378, 486, 378, 486, 378, 486, 378, 486, 630, 810, 630, 810, 630, 810, 630, 810, 2100, 2700, 2100, 2700, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 2100, 2700, 2100, 2700, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620, 1260, 1620};

if (xadj.size() != nME + 1 || adjncy.size() != xadj.back())
   throw std::runtime_error("xadj or adjncy tables are wrong.");

idx_t n = static_cast<idx_t>(nME);
idx_t ncon = 1;
idx_t nparts = 64;
std::vector<idx_t> part(n);
idx_t objval;
idx_t options[METIS_NOPTIONS];
METIS_SetDefaultOptions(options);
options[METIS_OPTION_OBJTYPE] = METIS_OBJTYPE_VOL;
options[METIS_OPTION_CTYPE] = METIS_CTYPE_SHEM;
options[METIS_OPTION_IPTYPE] = METIS_IPTYPE_GROW;
options[METIS_OPTION_RTYPE] = METIS_RTYPE_FM;
options[METIS_OPTION_NO2HOP] = 1;
options[METIS_OPTION_NCUTS] = 10;
options[METIS_OPTION_MINCONN] = 1;
options[METIS_OPTION_CONTIG] = 1;
options[METIS_OPTION_NUMBERING] = 0;

options[METIS_OPTION_UFACTOR] = 10; // Used to set a limit on imbalanceness between partitions
options[METIS_OPTION_SEED] = 1; // Set a seed for the randomness
options[METIS_OPTION_NITER] = 10; // Set a max number of iterations for metis to do its work
// options[METIS_OPTION_DBGLVL] = METIS_DBG_INFO; // Set the verbose level for debugging

int status = METIS_PartGraphKway(&n, &ncon, xadj.data(), adjncy.data(), weight.data(), NULL, NULL, &nparts, NULL, NULL, options, &objval, part.data());

if (status != METIS_OK) {
   switch (status)
      {
         case METIS_ERROR_INPUT:
            throw std::runtime_error("Error: METIS's input error.");
            break;
         case METIS_ERROR_MEMORY:
            throw std::runtime_error("Error: METIS memory error.");
            break;                

          default:
             throw std::runtime_error("Error: METIS error.");
             break;
       }
   }

return 0;
}
ThibaultGH commented 2 weeks ago

Following this issue, see PR 94