RunOnFlux / flux

Flux, Your Gateway to a Decentralized World. https://home.runonflux.io https://api.runonflux.io https://docs.runonflux.io https://source.runonflux.io https://wiki.runonflux.io
https://home.runonflux.io
GNU Affero General Public License v3.0
232 stars 309 forks source link

[BUG] Flux node won't start if it can't contact at least 1 node on the testnet. #671

Closed mattconres closed 2 years ago

mattconres commented 2 years ago

Github issue

Opening per Request of Valter

This is on TestNet

Describe the bug Flux node will not connect sufficiently to start benchmarks cause it can't connect to the one node on the net.

To Reproduce This is difficult if not impossible to reproduce simply cause this is not the way it's supposed to work.

Scenario. Testnet

  1. No nodes online on the testnet.

  2. Brought up my local hosted node on 16147.

  3. TestNode1 is the only one on the network.

    
        {
          "total": 1,
          "stable": 1,
          "basic-enabled": 1,
          "super-enabled": 0,
          "bamf-enabled": 0,
          "cumulus-enabled": 1,
          "nimbus-enabled": 0,
          "stratus-enabled": 0,
          "ipv4": 0,
          "ipv6": 0,
          "onion": 0
        }```
  4. Started up TestNode2. Updated reinstalled validated all UPNP networking port 16157 is open etc. Flux install looks 100%.

  5. Benchmark fails however 2022-08-01 15:10:32 ---Failed: FluxOS is not working properly. Check your networking configuration or FluxOs error log.

    
        `{
          "ipaddress": "<MY PUBLICIP>:16157",
          "architecture": "",
          "armboard": "",
          "status": "failed",
          "time": 1659370243,
          "real_cores": 0,
          "cores": 0,
          "ram": 0,
          "ssd": 0,
          "hdd": 0,
          "ddwrite": 0,
          "totalstorage": 0,
          "disksinfo": [
          ],
          "eps": 0,
          "ping": 0,
          "download_speed": 0,
          "upload_speed": 0,
          "bench_version": "0.0.0",
          "speed_version": "0.0.0",
          "error": "Failed: FluxOS is not working properly. Check your networking configuration or FluxOs error log."
        }```
  6. Checking FLux log shows

       2022-08-01T16:18:37.132Z          connect ECONNREFUSED <MY PUBLIC IP>:16147
       2022-08-01T16:18:37.132Z          Flux communication is limited
       Error: connect ECONNREFUSED <MY PUBLIC IP>:16147
         at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1159:16)
  7. Assumption is .. The TestNode2 is trying to validate against the existing node on the network. But it can't so this won't allow benchmarks to continue.

  8. Since this is a self hosted environment. And my firewall will not allow Hairpin NAT with UPNP. The 2 nodes can not talk to another. Since the 2nd node does see the frist node on the chain/network it refuses to come up until it can talk to at least one node. Strangely the first node did come up without any other nodes on the network But I assume that is cause there wasn't one to try to validate against.

  9. All start messages are coming into the fluxd but of course it won't confirm until benchmark passes. Benchmark won't pass until Flux acknowledges valid network connection. Flux can't validate network connection cause the 1 node on the testnet is not accesible.

Expected behavior The node to come up and confirm :-)

Environment(please complete the following information):

Additional context

This IS Testnet.. and this is a very unusual situation. And only submitting this to bring it to your attention

TheTrunk commented 2 years ago

So this is coming from checkMyFluxAvailability function that tests if other fluxnodes on the network can connect to our new fluxnode. We can bend the rules for in cases where not sufficient amount of nodes is present on the network

mattconres commented 2 years ago

I was requested to post a report on this out valter .. as I said this would probably ONLY happen in testnet .. But thanks it's crypto winter on the testnet and only my node there. Maybe just a flag to test if testnet ..

TheTrunk commented 2 years ago

I was requested to post a report on this out valter .. as I said this would probably ONLY happen in testnet .. But thanks it's crypto winter on the testnet and only my node there. Maybe just a flag to test if testnet ..

Yes flag for testnet to that function is sufficient but would solve it only for testnet. Same bug would occur if mainnet would have 0 nodes online thus we will implement more complex fix

mattconres commented 2 years ago

Understood… Scream if you need a tester. I have 5 nodes behind the same IP that can't talk to each other. And I'm the only node alive on testnet at the moment.

mattconres commented 2 years ago

Just wanted to follow up.. and confirm this does work !!! Testnet dropped to the point I was the only node on the network with the same IP . Successfully started up additional nodes behind the same IP.