cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.09k stars 3.8k forks source link

gossip: connectivity logging is extremely spammy in 128 node cluster #30088

Closed a-robinson closed 6 years ago

a-robinson commented 6 years ago

I spun up a 128 node cluster for tpc-c 50k testing (it was meant to be / will soon be 120, but my brain accidentally rounded up when creating it), and the gossip connectivity logs are overwhelming. First of all, they're huge, taking up almost my entire screen with just one log line:

I180911 18:58:34.645464 127 gossip/gossip.go:941  [n1] gossip connectivity
  n3 [sentinel];
  n1 -> n24; n1 -> n45; n1 -> n46; n1 -> n74; n1 -> n105; n1 -> n122; n2 -> n1; n2 -> n36; n2 -> n56; n2 -> n90; n2 -> n94; n3 -> n1; n3 -> n83; n4 -> n1; n4 -> n21; n4 -> n87; n4 -> n89; n4 -> n106; n4 -
> n109; n5 -> n2; n5 -> n71; n5 -> n79; n5 -> n120; n5 -> n127; n6 -> n4; n6 -> n14; n6 -> n65; n6 -> n68; n6 -> n98; n6 -> n99; n7 -> n4; n7 -> n57; n7 -> n91; n7 -> n94; n7 -> n115; n8 -> n2; n8 -> n35;
 n8 -> n63; n8 -> n100; n8 -> n103; n8 -> n119; n9 -> n3; n9 -> n87; n9 -> n96; n9 -> n99; n9 -> n115; n10 -> n4; n10 -> n30; n10 -> n60; n10 -> n83; n10 -> n127; n11 -> n2; n11 -> n17; n11 -> n29; n11 -> n30; n11 -> n45; n11 -> n84; n12 -> n3; n12 -> n42; n12 -> n87; n13 -> n8; n13 -> n21; n13 -> n22; n13 -> n25; n13 -> n34; n13 -> n76; n14 -> n11; n14 -> n37; n14 -> n62; n14 -> n78; n14 -> n110; n14 -> n117; n15 -> n3; n15 -> n14; n15 -> n49; n15 -> n59; n15 -> n74; n15 -> n92; n16 -> n3; n16 -> n5; n16 -> n79; n16 -> n112; n16 -> n126; n17 -> n9; n17 -> n25; n17 -> n53; n17 -> n70; n17 -> n120; n18 -> n11; n18 -> n40; n18 -> n64; n18 -> n114; n18 -> n128; n19 -> n6; n19 -> n9; n19 -> n37; n19 -> n40; n19 -> n66; n19 -> n77; n20 -> n15; n20 -> n19; n20 -> n48; n20 -> n61; n20 -> n67; n20 -> n116; n21 -> n12; n21 -> n83; n21 -> n96; n21 -> n114; n22 -> n7; n22 -> n51; n22 -> n55; n22 -> n63; n22 -> n76; n22 -> n109; n23 -> n7; n23 -> n20; n23 -> n26; n23 -> n60; n23 -> n62; n23 -> n113; n24 -> n7; n24 -> n10; n24 -> n15; n24 -> n26; n24 -> n31; n24 -> n59; n25 -> n7; n25 -> n40; n25 -> n44; n25 -> n50; n25 -> n119; n25 -> n120; n26 -> n8; n26 -> n47; n26 -> n55; n26 -> n57; n26 -> n111; n26 -> n121; n27 -> n10; n27 -> n33; n27 -> n38; n27 -> n60; n27 -> n62; n28 -> n4; n28 -> n52; n28 -> n54; n28 -> n93; n28 -> n117; n28 -> n126; n29 -> n1; n29 -> n22; n29 -> n65; n29 -> n72; n29 -> n90; n30 -> n3; n30 -> n54; n30 -> n66; n30 -> n79; n30 -> n81; n30 -> n128; n31 -> n29; n31 -> n46; n31 -> n48; n31 -> n59; n31 -> n75; n31 -> n117; n32 -> n29; n32 -> n43; n32 -> n60; n32 -> n73; n32 -> n113; n33 -> n12; n33 -> n51; n33 -> n82; n33 -> n114; n33 -> n126; n34 -> n10; n34 -> n17; n34 -> n18; n34 -> n24; n34 -> n50; n35 -> n7; n35 -> n27; n35 -> n36; n35 -> n66; n35 -> n76; n36 -> n22; n36 -> n30; n36 -> n71; n36 -> n83; n36 -> n96; n36 -> n109; n37 -> n12; n37 -> n16; n37 -> n46; n37 -> n71; n37 -> n101; n37 -> n118; n38 -> n19; n38 -> n29; n38 -> n64; n38 -> n84; n38 -> n99; n38 -> n108; n39 -> n9; n39 -> n37; n39 -> n59; n39 -> n86; n39 -> n99; n39 -> n105; n40 -> n16; n40 -> n22; n40 -> n98; n40 -> n104; n40 -> n119; n40 -> n121; n41 -> n18; n41 -> n29; n41 -> n35; n41 -> n54; n41 -> n94; n41 -> n120; n42 -> n2; n42 -> n20; n42 -> n26; n42 -> n31; n42 -> n94; n42 -> n97; n43 -> n8; n43 -> n18; n43 -> n63; n43 -> n67; n43 -> n117; n44 -> n18; n44 -> n38; n44 -> n39; n44 -> n82; n44 -> n95; n44 -> n113; n45 -> n14; n45 -> n25; n45 -> n28; n45 -> n53; n45 -> n124; n46 -> n27; n46 -> n32; n46 -> n56; n46 -> n65; n46 -> n66; n46 -> n75; n47 -> n15; n47 -> n32; n47 -> n35; n47 -> n41; n47 -> n44; n48 -> n16; n48 -> n25; n48 -> n73; n48 -> n98; n48 -> n121; n48 -> n123; n49 -> n6; n49 -> n13; n49 -> n34; n49 -> n35; n49 -> n37; n49 -> n94; n50 -> n42; n50 -> n55; n50 -> n62; n50 -> n65; n50 -> n73; n50 -> n125; n51 -> n11; n51 -> n20; n51 -> n32; n51 -> n54; n51 -> n57; n52 -> n8; n52 -> n33; n52 -> n76; n52 -> n80; n52 -> n95; n52 -> n102; n53 -> n10; n53 -> n23; n53 -> n41; n53 -> n58; n53 -> n67; n53 -> n105; n54 -> n16; n54 -> n23; n54 -> n47; n54 -> n77; n54 -> n104; n54 -> n123; n55 -> n15; n55 -> n56; n55 -> n101; n55 -> n110; n55 -> n128; n56 -> n21; n56 -> n24; n56 -> n28; n56 -> n60; n56 -> n100; n57 -> n12; n57 -> n28; n57 -> n30; n57 -> n39; n57 -> n49; n58 -> n9; n58 -> n28; n58 -> n38; n58 -> n56; n58 -> n62; n58 -> n94; n59 -> n27; n59 -> n41; n59 -> n44; n59 -> n91; n59 -> n119; n59 -> n120; n60 -> n15; n60 -> n81; n60 -> n96; n60 -> n107; n60 -> n115; n61 -> n2; n61 -> n10; n61 -> n17; n61 -> n41; n61 -> n64; n61 -> n101; n62 -> n30; n62 -> n43; n62 -> n44; n62 -> n88; n63 -> n16; n63 -> n26; n63 -> n47; n63 -> n77; n63 -> n121; n63 -> n123; n64 -> n20; n64 -> n45; n64 -> n77; n64 -> n98; n64 -> n104; n64 -> n106; n65 -> n18; n65 -> n28; n65 -> n32; n65 -> n48; n65 -> n59; n65 -> n84; n66 -> n6; n66 -> n25; n66 -> n57; n66 -> n90; n66 -> n95; n66 -> n119; n67 -> n4; n67 -> n24; n67 -> n49; n67 -> n56; n67 -> n93; n67 -> n105; n68 -> n3; n68 -> n103; n68 -> n109; n68 -> n122; n69 -> n1; n69 -> n34; n69 -> n44; n69 -> n97; n69 -> n109; n69 -> n127; n70 -> n29; n70 -> n39; n70 -> n55; n70 -> n82; n70 -> n111; n71 -> n45; n71 -> n65; n71 -> n69; n71 -> n81; n71 -> n83; n71 -> n106; n72 -> n50; n72 -> n68; n72 -> n87; n72 -> n93; n72 -> n102; n73 -> n16; n73 -> n57; n73 -> n77; n73 -> n98; n73 -> n104; n73 -> n128; n74 -> n2; n74 -> n17; n74 -> n28; n74 -> n52; n74 -> n73; n74 -> n104; n75 -> n41; n75 -> n58; n75 -> n80; n75 -> n95; n75 -> n101; n76 -> n9; n76 -> n20; n76 -> n50; n76 -> n65; n76 -> n74; n77 -> n5; n77 -> n44; n77 -> n51; n77 -> n70; n77 -> n84; n77 -> n118; n78 -> n8; n78 -> n24; n78 -> n39; n78 -> n73; n78 -> n95; n78 -> n127; n79 -> n13; n79 -> n34; n79 -> n35; n79 -> n37; n79 -> n90; n79 -> n113; n80 -> n15; n80 -> n19; n80 -> n67; n80 -> n112; n80 -> n128; n81 -> n7; n81 -> n18; n81 -> n53; n81 -> n80; n81 -> n96; n81 -> n124; n82 -> n68; n82 -> n84; n82 -> n86; n82 -> n105; n82 -> n107; n83 -> n8; n83 -> n33; n83 -> n47; n83 -> n58; n83 -> n70; n84 -> n19; n84 -> n39; n84 -> n52; n84 -> n78; n84 -> n92; n84 -> n100; n85 -> n38; n85 -> n42; n85 -> n69; n85 -> n79; n85 -> n102; n86 -> n11; n86 -> n63; n86 -> n72; n86 -> n76; n86 -> n116; n86 -> n117; n87 -> n31; n87 -> n64; n87 -> n73; n87 -> n95; n87 -> n110; n88 -> n25; n88 -> n34; n88 -> n46; n88 -> n69; n88 -> n74; n88 -> n110; n89 -> n13; n89 -> n42; n89 -> n61; n89 -> n69; n89 -> n81; n89 -> n92; n90 -> n9; n90 -> n19; n90 -> n23; n90 -> n31; n90 -> n36; n90 -> n107; n91 -> n68; n91 -> n86; n91 -> n93; n91 -> n108; n91 -> n112; n92 -> n23; n92 -> n27; n92 -> n69; n92 -> n78; n92 -> n105; n93 -> n61; n93 -> n63; n93 -> n88; n93 -> n89; n93 -> n92; n94 -> n32; n94 -> n38; n94 -> n54; n94 -> n55; n94 -> n66; n94 -> n117; n95 -> n6; n95 -> n30; n95 -> n34; n95 -> n97; n95 -> n107; n96 -> n27; n96 -> n40; n96 -> n43; n96 -> n70; n96 -> n79; n96 -> n92; n97 -> n20; n97 -> n32; n97 -> n36; n97 -> n57; n97 -> n61; n98 -> n5; n98 -> n23; n98 -> n53; n98 -> n75; n98 -> n112; n99 -> n31; n99 -> n51; n99 -> n82; n99 -> n91; n99 -> n110; n99 -> n111; n100 -> n50; n100 -> n61; n100 -> n68; n100 -> n108; n100 -> n115; n101 -> n6; n101 -> n22; n101 -> n92; n101 -> n107; n101 -> n112; n102 -> n38; n102 -> n48; n102 -> n55; n102 -> n76; n102 -> n81; n103 -> n85; n103 -> n89; n103 -> n101; n103 -> n111; n103 -> n115; n104 -> n5; n104 -> n13; n104 -> n53; n104 -> n78; n104 -> n116; n105 -> n14; n105 -> n17; n105 -> n35; n105 -> n52; n105 -> n81; n105 -> n88; n106 -> n33; n106 -> n54; n106 -> n62; n106 -> n70; n106 -> n74; n106 -> n82; n107 -> n21; n107 -> n22; n107 -> n24; n107 -> n31; n107 -> n47; n107 -> n51; n108 -> n14; n108 -> n63; n108 -> n71; n108 -> n85; n108 -> n97; n109 -> n40; n109 -> n58; n109 -> n70; n109 -> n85; n109 -> n108; n109 -> n120; n110 -> n10; n110 -> n19; n110 -> n37; n110 -> n40; n110 -> n43; n110 -> n72; n111 -> n11; n111 -> n27; n111 -> n43; n111 -> n64; n111 -> n90; n112 -> n78; n112 -> n79; n112 -> n88; n112 -> n111; n112 -> n113; n112 -> n126; n113 -> n33; n113 -> n43; n113 -> n64; n113 -> n67; n113 -> n86; n113 -> n100; n114 -> n60; n114 -> n77; n114 -> n98; n114 -> n104; n114 -> n123; n115 -> n89; n115 -> n91; n115 -> n97; n115 -> n106; n115 -> n116; n115 -> n123; n116 -> n26; n116 -> n49; n116 -> n85; n116 -> n114; n116 -> n127; n117 -> n53; n117 -> n68; n117 -> n71; n117 -> n96; n117 -> n116; n117 -> n127; n118 -> n13; n118 -> n36; n118 -> n49; n118 -> n91; n118 -> n124; n119 -> n12; n119 -> n42; n119 -> n106; n119 -> n110; n119 -> n114; n120 -> n48; n120 -> n67; n120 -> n71; n120 -> n86; n120 -> n100; n120 -> n118; n121 -> n5; n121 -> n51; n121 -> n84; n121 -> n89; n121 -> n109; n122 -> n42; n122 -> n75; n122 -> n78; n122 -> n88; n122 -> n89; n123 -> n5; n123 -> n17; n123 -> n56; n123 -> n66; n123 -> n101; n123 -> n119; n124 -> n12; n124 -> n48; n124 -> n52; n124 -> n97; n124 -> n103; n124 -> n107; n125 -> n58; n125 -> n69; n125 -> n85; n125 -> n99; n125 -> n112; n125 -> n115; n126 -> n1; n126 -> n50; n126 -> n74; n126 -> n93; n126 -> n111; n127 -> n52; n127 -> n59; n127 -> n88; n127 -> n106; n127 -> n113; n127 -> n126; n128 -> n4; n128 -> n6; n128 -> n13; n128 -> n47; n128 -> n61; n128 -> n100;

Secondly, they're being logged a few times per second, making it impossible to read anything else out of the logs.

The second point is probably indicative of some amount of instability in the gossip network that should be investigated on its own right. However, even if it was only being logged once a minute as part of the gossip status, this would be quite annoying. The entire "gossip status" message literally doesn't fit on the screen of my 15" macbook:

screen shot 2018-09-11 at 2 15 23 pm

Also, wow that's a lot of data being transmitted over gossip (the server indicates 25GB sent / 12GB received). I suspect a lot of it is the connectivity map being resent in its entirety over and over.

It'd be reasonable to say we should first understand/fix the fact that there are frequent changes happening in the gossip network. I'm still worried about how things will look after that, though.

@petermattis

petermattis commented 6 years ago

I think we can remove the every minute logging of gossip connectivity. The other logging only occurs when gossip connectivity changes. It shouldn't be changing frequently and we should figure out why it is.

a-robinson commented 6 years ago

Yeah, I'll check out what's going on with gossip after the tpc-c runs are done, or more likely I'll just spin up a cluster using smaller nodes later on to focus on the gossip in a more cost effective way.

However, there will always be a lot of logging triggered every time one or more nodes get restarted as the gossip network adjusts to the change(s) and stabilizes, which will certainly add up during an event like a rolling upgrade.

petermattis commented 6 years ago

On a 3 node cluster we gossip at ~2KB/sec. On a 9 node cluster we gossip at ~12KB/sec. On a 27 node cluster we gossip at ~55KB/sec. None of these numbers scale up to 37GB gossiped over 2 hours. I think there is something to investigate there.

petermattis commented 6 years ago

I think there is something to investigate there.

We both gossip the cluster-id as well as include it in every gossip.Request. That seems excessive, though I doubt it has a significant impact.

petermattis commented 6 years ago

On a 3 node cluster, over the course of 1 minute we send/receive the following keys via gossip:

    gossip-clients:1      138     6348
          cluster-id       57     3363
          cluster-id       53     3339
         first-range       35     2870
            sentinel       38     2394
    gossip-clients:3       45     2115
             store:1        6     2100
    gossip-clients:2       35     1680
          liveness:2       15      975
          liveness:3       13      845
          liveness:1       11      715
             store:3        2      708
             store:2        2      708

The second column is the number of times the key was sent/received. The third column is the size of the gossiped value times the number of times it was gossiped (providing an indication of what fraction of the gossiped bytes it is consuming).

Note that the sizes above are only measuring gossip.Info. In my testing, this only accounts for ~40% of the gossip traffic. The other 60% is gossip.{Request,Response} overhead. I suspect most of the extra data is in gossip.HighWaterStamps, though I haven't confirmed that.

petermattis commented 6 years ago

We need to gossip the sentinel frequently, but there is no need to be gossiping the cluster-id and first-range so fast.

petermattis commented 6 years ago

The numbers I posted above were the sent/received gossip infos from a single node. The math inclined will notice that we sent/received 138 gossip-clients:1 keys. I looked into this a bit more and it turns out we sent 138 keys during that minute time frame. But something is off here because we are only supposed to send that key every 2 seconds so we should have gossiped it 60 times. A bit more instrumentation shows that every time AddInfo is called for that key, we send it two each connected node twice and sometimes three times. I'm not sure why. Definitely a bug.

petermattis commented 6 years ago

I'm pretty sure the problem is due to how we update highWaterStamps. For server connections, we only update highWaterStamps upon receipt of a new request from the remote node. But that means that if we have a new info to send, we'll keep sending it until the remote node sends another request. I'm pretty sure the same problem occurs for the client-side of the connection. Rather than waiting for an update from the remote node, I think we need to update our view of the remote's highWaterStamps as soon as we send them something.

petermattis commented 6 years ago

See https://github.com/cockroachdb/cockroach/issues/30126. I think that is an explanation for the gigabytes of gossip traffic.

petermattis commented 6 years ago

@a-robinson Do you think there is anything else to address here for 2.1? My understanding is that gossip stabilized on your 100+ node cluster and this spam just affected startup of the cluster.

a-robinson commented 6 years ago

Unfortunately, that doesn't seem to be the case. I saved the latest log file from the cluster before taking it down, and all the log files have a lot of the connectivity logs even though no nodes were being added/removed around that time. Two nodes were down due to the bad_alloc issue, but they had been down for well over an hour before the time period these logs cover. Let me know if you'd like to see them for yourself.

I imagine the cull logic in gossip that closes the least useful connection is causing a lot of this, and at a high level I'm not sure that connectivity logging on changes can happily coexist with the culling of an outgoing connection from every node every minute, or at least not on large clusters.

Also, orthogonally from the thrashing, the logging during startup is excessive. We log every time we get new connectivity info that isn't identical to the last thing we logged, but during startup a node can get gossip info piece by piece, leading to log output like the below that keeps going on and on until all the nodes' info is received:

I180918 20:02:51.829369 55 gossip/client.go:129  [n?] started gossip client to 35.229.39.61:26257
I180918 20:02:51.836827 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
I180918 20:02:51.837244 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135;
I180918 20:02:51.837609 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135; n66 -> n27; n66 -> n55; n66 -> n56; n66 -> n88; n66 -> n102; n81 -> n9; n81 -> n12; n81 -> n66; n81 -> n76; n81 -> n87;
I180918 20:02:51.837920 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135; n37 -> n8; n37 -> n64; n37 -> n65; n37 -> n97; n37 -> n101; n52 -> n5; n52 -> n19; n52 -> n68; n52 -> n79; n52 -> n106; n55 -> n12; n55 -> n58; n55 -> n68; n55 -> n75; n55 -> n96; n66 -> n27; n66 -> n55; n66 -> n56; n66 -> n88; n66 -> n102; n71 -> n43; n71 -> n75; n71 -> n87; n71 -> n88; n71 -> n128; n81 -> n9; n81 -> n12; n81 -> n66; n81 -> n76; n81 -> n87; n92 -> n10; n92 -> n26; n92 -> n75; n92 -> n94; n92 -> n132; n106 -> n2; n106 -> n10; n106 -> n61; n106 -> n70; n106 -> n93;
I180918 20:02:51.838131 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135; n37 -> n8; n37 -> n64; n37 -> n65; n37 -> n97; n37 -> n101; n52 -> n5; n52 -> n19; n52 -> n68; n52 -> n79; n52 -> n106; n55 -> n12; n55 -> n58; n55 -> n68; n55 -> n75; n55 -> n96; n66 -> n27; n66 -> n55; n66 -> n56; n66 -> n88; n66 -> n102; n71 -> n43; n71 -> n75; n71 -> n87; n71 -> n88; n71 -> n128; n81 -> n9; n81 -> n12; n81 -> n66; n81 -> n76; n81 -> n87; n92 -> n10; n92 -> n26; n92 -> n75; n92 -> n94; n92 -> n132; n104 -> n9; n104 -> n98; n104 -> n110; n104 -> n115; n104 -> n128; n106 -> n2; n106 -> n10; n106 -> n61; n106 -> n70; n106 -> n93;
I180918 20:02:51.838544 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n7 -> n37; n7 -> n40; n7 -> n42; n7 -> n71; n7 -> n78; n26 -> n29; n26 -> n58; n26 -> n118; n26 -> n120; n26 -> n127; n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135; n37 -> n8; n37 -> n64; n37 -> n65; n37 -> n97; n37 -> n101; n47 -> n11; n47 -> n13; n47 -> n23; n47 -> n76; n47 -> n78; n52 -> n5; n52 -> n19; n52 -> n68; n52 -> n79; n52 -> n106; n55 -> n12; n55 -> n58; n55 -> n68; n55 -> n75; n55 -> n96; n66 -> n27; n66 -> n55; n66 -> n56; n66 -> n88; n66 -> n102; n71 -> n43; n71 -> n75; n71 -> n87; n71 -> n88; n71 -> n128; n81 -> n9; n81 -> n12; n81 -> n66; n81 -> n76; n81 -> n87; n92 -> n10; n92 -> n26; n92 -> n75; n92 -> n94; n92 -> n132; n104 -> n9; n104 -> n98; n104 -> n110; n104 -> n115; n104 -> n128; n106 -> n2; n106 -> n10; n106 -> n61; n106 -> n70; n106 -> n93; n116 -> n12; n116 -> n18; n116 -> n69; n116 -> n73; n116 -> n76;
I180918 20:02:51.838867 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n7 -> n37; n7 -> n40; n7 -> n42; n7 -> n71; n7 -> n78; n26 -> n29; n26 -> n58; n26 -> n118; n26 -> n120; n26 -> n127; n30 -> n4; n30 -> n31; n30 -> n34; n30 -> n63; n30 -> n79; n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135; n37 -> n8; n37 -> n64; n37 -> n65; n37 -> n97; n37 -> n101; n45 -> n37; n45 -> n46; n45 -> n58; n45 -> n77; n45 -> n85; n47 -> n11; n47 -> n13; n47 -> n23; n47 -> n76; n47 -> n78; n52 -> n5; n52 -> n19; n52 -> n68; n52 -> n79; n52 -> n106; n55 -> n12; n55 -> n58; n55 -> n68; n55 -> n75; n55 -> n96; n66 -> n27; n66 -> n55; n66 -> n56; n66 -> n88; n66 -> n102; n71 -> n43; n71 -> n75; n71 -> n87; n71 -> n88; n71 -> n128; n72 -> n17; n72 -> n18; n72 -> n77; n72 -> n90; n72 -> n130; n81 -> n9; n81 -> n12; n81 -> n66; n81 -> n76; n81 -> n87; n92 -> n10; n92 -> n26; n92 -> n75; n92 -> n94; n92 -> n132; n94 -> n42; n94 -> n43; n94 -> n115; n94 -> n129; n94 -> n134; n104 -> n9; n104 -> n98; n104 -> n110; n104 -> n115; n104 -> n128; n106 -> n2; n106 -> n10; n106 -> n61; n106 -> n70; n106 -> n93; n116 -> n12; n116 -> n18; n116 -> n69; n116 -> n73; n116 -> n76; n130 -> n4; n130 -> n11; n130 -> n19; n130 -> n20; n130 -> n65;
I180918 20:02:51.839272 168 gossip/gossip.go:942  [n?] gossip connectivity
  n47 [sentinel];
  n7 -> n37; n7 -> n40; n7 -> n42; n7 -> n71; n7 -> n78; n26 -> n29; n26 -> n58; n26 -> n118; n26 -> n120; n26 -> n127; n30 -> n4; n30 -> n31; n30 -> n34; n30 -> n63; n30 -> n79; n32 -> n95; n32 -> n96; n32 -> n122; n32 -> n132; n32 -> n135; n37 -> n8; n37 -> n64; n37 -> n65; n37 -> n97; n37 -> n101; n45 -> n37; n45 -> n46; n45 -> n58; n45 -> n77; n45 -> n85; n47 -> n11; n47 -> n13; n47 -> n23; n47 -> n76; n47 -> n78; n52 -> n5; n52 -> n19; n52 -> n68; n52 -> n79; n52 -> n106; n55 -> n12; n55 -> n58; n55 -> n68; n55 -> n75; n55 -> n96; n63 -> n17; n63 -> n36; n63 -> n39; n63 -> n101; n63 -> n119; n66 -> n27; n66 -> n55; n66 -> n56; n66 -> n88; n66 -> n102; n71 -> n43; n71 -> n75; n71 -> n87; n71 -> n88; n71 -> n128; n72 -> n17; n72 -> n18; n72 -> n77; n72 -> n90; n72 -> n130; n81 -> n9; n81 -> n12; n81 -> n66; n81 -> n76; n81 -> n87; n92 -> n10; n92 -> n26; n92 -> n75; n92 -> n94; n92 -> n132; n94 -> n42; n94 -> n43; n94 -> n115; n94 -> n129; n94 -> n134; n104 -> n9; n104 -> n98; n104 -> n110; n104 -> n115; n104 -> n128; n106 -> n2; n106 -> n10; n106 -> n61; n106 -> n70; n106 -> n93; n116 -> n12; n116 -> n18; n116 -> n69; n116 -> n73; n116 -> n76; n130 -> n4; n130 -> n11; n130 -> n19; n130 -> n20; n130 -> n65;
...
a-robinson commented 6 years ago

I've been tossing around the idea of rate limiting this logging to happen no more than once a minute, which would alleviate most of the pain. Would that still accomplish most of the original goal? I realize we've discovered some gossip problems lately, but I'm not really convinced that this information belongs in non-verbose logs.

petermattis commented 6 years ago

It's true that the gossip connectivity logs might be addressing yesterday's war. Perhaps we should re-add them to the once per minute logging, but only output the message if it has changed since the last log line.