IntersectMBO / cardano-node

The core component that is used to participate in a Cardano decentralised blockchain.
https://cardano.org
Apache License 2.0
3.06k stars 721 forks source link

[BUG] - ChainTransitionError #2725

Closed maierfelix closed 1 year ago

maierfelix commented 3 years ago

Internal/External External

Summary Unable to synchronize when setting up a new node

Steps to reproduce

  1. Download the latest Cardano node build (I'm using version 1.27.0 and build 6413627)
  2. Download configs and genesis file (I'm using version 6426791):
    
    export LAST_BUILD=$(curl -s https://hydra.iohk.io/job/Cardano/cardano-node/cardano-deployment/latest-finished/download/1/index.html | grep -e "This item has moved" |  sed -e 's/.*build\/\(.*\)\/download.*/\1/')
    wget -q -O mainnet-config.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-config.json
    wget -q -O mainnet-byron-genesis.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-byron-genesis.json
    wget -q -O mainnet-shelley-genesis.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-shelley-genesis.json
    wget -q -O mainnet-topology.json https://hydra.iohk.io/build/${LAST_BUILD}/download/1/mainnet-topology.json
3. Create a cardano node with:

cardano-node run \ --config xxx/cnode/config/mainnet-config.json \ --topology xxx/cnode/config/mainnet-topology.json \ --database-path xxx/cnode/db/ \ --socket-path xxx/cnode/sockets/node.socket \ --port 3001


**Unexpected behavior**
The node is syncing properly until about block `4492800` (shelley era begin?) and starts to throw errors. I tried deleting the `ledger` and `volatile` folders in the database, but the errors still appear. Error log:

[DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:35.02 UTC] Chain extended, new tip: c2df25e41eb7de23b702e2affc7c2d171b53dd9dc824b26a48890f365ab89f3d at slot 4483574 Event: LedgerUpdate (HardForkUpdateInEra Z (WrapLedgerUpdate {unwrapLedgerUpdate = ByronUpdatedProtocolUpdates [ProtocolUpdate {protocolUpdateVersion = 2.0.0, protocolUpdateState = UpdateStableCandidate (EpochNo 208)}]})) [DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:35.02 UTC] Chain extended, new tip: 92f7452fb13b978483e8a737400c76e1460d65e8c03d3e60b9cdfec959e4c4cd at slot 4483575 [DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:36.27 UTC] Chain extended, new tip: 50be594a349edd8c3cf44bc5e43aac75e85404f1073d3e60cb00a96b3b3f3872 at slot 4484538 [DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:37.52 UTC] Chain extended, new tip: b994e91880f17824d36564155cc941803e213b29888851bdbc4a7dba07b654fc at slot 4485603 [DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:38.84 UTC] Chain extended, new tip: f930ab3caeac77f9dc597d07df48c86e28a020bb07937be606037ec9f3a2fc10 at slot 4486673 [DESKTOP-:cardano.node.ChainDB:Notice:34] [2021-05-21 11:11:40.09 UTC] Chain extended, new tip: 131df406e92d4658c08623a70b8f450830bf4ce3a114025b319a674d2f0fca97 at slot 4487780 [DESKTOP-:cardano.node.DnsSubscription:Error:78] [2021-05-21 11:11:41.02 UTC] Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 52.58.171.193:3001 HeaderError (At (Block {blockPointSlot = SlotNo 4492800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedProof = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI\SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4488479) eea2bfa5527f752af311507d13dd03dc39eb0e87537955a6cc03dcfd358efde7 (BlockNo 4486190)) (Tip (SlotNo 30029101) 15c87c1604b4c1e7fb4f2038ea9ff9511669ce908f1825d07a75a7558640ac4e (BlockNo 5746606)) [DESKTOP-:cardano.node.ErrorPolicy:Warning:52] [2021-05-21 11:11:41.03 UTC] IP 52.58.171.193:3001 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (HeaderError (At (Block {blockPointSlot = SlotNo 4492800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedProof = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI\SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4488479) eea2bfa5527f752af311507d13dd03dc39eb0e87537955a6cc03dcfd358efde7 (BlockNo 4486190)) (Tip (SlotNo 30029101) 15c87c1604b4c1e7fb4f2038ea9ff9511669ce908f1825d07a75a7558640ac4e (BlockNo 5746606))))) 200s 200s [DESKTOP-:cardano.node.DnsSubscription:Error:75] [2021-05-21 11:11:41.03 UTC] Domain: "relays.stakepool247.eu" Application Exception: 35.228.55.2:3001 HeaderError (At (Block {blockPointSlot = SlotNo 4492800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedProof = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL*\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI\SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4488480) 3fe1c2ae2065d4295691d6724a6b0558ad85e8abe53c9412c39295d208ebb9ef (BlockNo 4486191)) (Tip (SlotNo 30029101) 15c87c1604b4c1e7fb4f2038ea9ff9511669ce908f1825d07a75a7558640ac4e (BlockNo 5746606))



**System info (please complete the following information):**
- OS: Ubuntu
- Version: 20.04.2 LTS
- Node version: 1.27.0 (git rev 69c77dcc13d983a06ce42ab598dc3329762e3733)
ducknessman commented 3 years ago

I encountered the same problem, if it is solved, please tell me.

ducknessman commented 3 years ago

Is it possible to skip this fixed block?

erikd commented 3 years ago

@maierfelix please post the 4 config files you downloaded.

Is it possible to skip this fixed block?

@ducknessman No, this is not possible.

ducknessman commented 3 years ago

image i download the file from https://docs.cardano.org/projects/cardano-node/en/latest/stake-pool-operations/getConfigFiles_AND_Connect.html wget https://hydra.iohk.io/job/Cardano/cardano-node/cardano-deployment/latest-finished/download/1/mainnet-config.json wget https://hydra.iohk.io/job/Cardano/cardano-node/cardano-deployment/latest-finished/download/1/mainnet-byron-genesis.json wget https://hydra.iohk.io/job/Cardano/cardano-node/cardano-deployment/latest-finished/download/1/mainnet-shelley-genesis.json wget https://hydra.iohk.io/job/Cardano/cardano-node/cardano-deployment/latest-finished/download/1/mainnet-topology.json

ducknessman commented 3 years ago

@maierfelix please post the 4 config files you downloaded.

Is it possible to skip this fixed block?

@ducknessman No, this is not possible

Which configuration file do you need to view?

erikd commented 3 years ago

Config files look fine.

Do you still see the Chain extended, new tip messages? If so, its is still syncing and anything messages are probably irrelevant.

ducknessman commented 3 years ago

Config files look fine.

Do you still see the Chain extended, new tip messages? If so, its is still syncing and anything messages are probably irrelevant.

no, Chain extended, new tip does not exist,the error log is

Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 3.123.218.74:3001 HeaderError (At (Block {blockPointSlot = SlotNo 449       2800, blockPointHash = aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure        (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 4492800) (Nonce "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7") (CertifiedVRF {certifiedOutput = OutputVR       F {getOutputVRFBytes = "6\236Sx\209\245\EOT\SUBY\235\141\150\230\GS\233o\tP\251A\180\159\245\DC1\247\188\DEL\209\t\212\&8>\GS$\190p4\230t\156f\DC2p\r\213\206\176\198ew\184\138\EM\174(k\DC3!\209[\206\SUB\183\&6"}, certifiedPro       of = CertPraosVRF "@Z\163p\255\NUL\149D\162\190J\165\188R\196Ec3\244\249\182W\GSf\197Y\r.V)\208\139:6\t\244\189\ENQ\STX\237]\v\225\171\219\DEL*\183j\174\174G\254\DC1\ESC\ETX5\164\228\222\246F\147\SYN'\148\184\211\193\202qP\SI       \SYN\177\226DrL\ETX"}))]})))) (Tip (SlotNo 4492798) 7e037dcb8995990d49d69ffc83327e79291e3e9c71fa46e0ddd48a1f6016f3a3 (BlockNo 4490509)) (Tip (SlotNo 30100085) 04f357385eab1fcaef35016a4f6b9ad262a100270eb9d322e98eb1b0b8b92608 (       BlockNo 5750124))
erikd commented 3 years ago

And what happens after that if anything?

And what does cardano-node --version say?

And what are the machine specs?

ducknessman commented 3 years ago

And what happens after that if anything?

And what does cardano-node --version say?

And what are the machine specs?

image

OS: Ubuntu Version 18.04 cpu :8 mem : 32 1T ssd

maierfelix commented 3 years ago

Config files look fine.

Do you still see the Chain extended, new tip messages? If so, its is still syncing and anything messages are probably irrelevant.

I neither get the Chain extended messages after the errors start to throw. I'm checking the current state with cardano-cli query tip --mainnet and the block and slot properties don't seem to increment anymore

erikd commented 3 years ago

@ducknessman 32G is more than enough. 8G should be sufficient (at least for now).

@maierfelix Machine specs?

ducknessman commented 3 years ago

@ducknessman 32G is more than enough. 8G should be sufficient (at least for now).

@maierfelix Machine specs?

I am resynchronizing now. If there is a solution, I hope I can provide it. I started the synchronization yesterday and did not do any operation, but suddenly out of synchronization at a fixed height.

ducknessman commented 3 years ago

i have a new question, Does ada synchronize through snapshots?

erikd commented 3 years ago

No, snapshots are currently not available.

maierfelix commented 3 years ago

@ducknessman 32G is more than enough. 8G should be sufficient (at least for now).

@maierfelix Machine specs?

I'm on Windows 10 and just tried running with Docker, same chain errors. Before, I used Ubuntu 20. 04 on WSL2 using the pre-built binaries and also by building from source, both times it failed with the same errors and stops syncing

Machine specs:

erikd commented 3 years ago

I have posted this ticket on the internal IOHK Slack. Hoping someone more knowledgable than me can respond.

disassembler commented 3 years ago

can you try with config files here? https://hydra.iohk.io/build/6198010/download/1/index.html

A couple changes were made recently for upcoming alonzo era and I want to see if using the configs recommended for 1.27.0 resolves the issue for you.

SteveDevDev commented 3 years ago

can you try with config files here? https://hydra.iohk.io/build/6198010/download/1/index.html

A couple changes were made recently for upcoming alonzo era and I want to see if using the configs recommended for 1.27.0 resolves the issue for you.

This fixed it for me. I put the JSON files on both of my nodes and restarted them. It's syncing again now. Thanks!

erikd commented 3 years ago

@maierfelix @ducknessman Any feedback on @disassembler's suggestion?

Would also be interested in seeing a diff between the files you retrieved and the one that works.

ducknessman commented 3 years ago

@maierfelix @ducknessman Any feedback on @disassembler's suggestion?

Would also be interested in seeing a diff between the files you retrieved and the one that works.

I just replaced all the config files, and now it has started to synchronize. I made a comparison before replacing it and found that there are still many differences. I hope that the link on the document can be replaced. image thx sooooooo much

profd2004 commented 3 years ago

@maierfelix @ducknessman Any feedback on @disassembler's suggestion? Would also be interested in seeing a diff between the files you retrieved and the one that works.

I just replaced all the config files, and now it has started to synchronize. I made a comparison before replacing it and found that there are still many differences. I hope that the link on the document can be replaced. image thx sooooooo much

The shelley-genesis.json over at https://hydra.iohk.io/build/6198010/download/1/index.html does not reflect the diff from your screenshot. The last key:value pair in the json is still "securityParam": 2160.

@ducknessman where are you getting your config files?

ducknessman commented 3 years ago

@maierfelix @ducknessman Any feedback on @disassembler's suggestion? Would also be interested in seeing a diff between the files you retrieved and the one that works.

I just replaced all the config files, and now it has started to synchronize. I made a comparison before replacing it and found that there are still many differences. I hope that the link on the document can be replaced. image thx sooooooo much

The shelley-genesis.json over at https://hydra.iohk.io/build/6198010/download/1/index.html does not reflect the diff from your screenshot. The last key:value pair in the json is still "securityParam": 2160.

@ducknessman where are you getting your config files?

the link is : https://hydra.iohk.io/build/6198010/download/1/index.html

profd2004 commented 3 years ago

Did the config file thing work for anyone else? I'm looking at the shelley-genesis over at the links in this thread and neither the testnet nor mainnet files (https://hydra.iohk.io/build/6198010/download/1/testnet-shelley-genesis.json, https://hydra.iohk.io/build/6198010/download/1/mainnet-shelley-genesis.json) look like @ducknessman screenshot.

Neither of those files for example has a costModel: example/shelley/alonzo/costmodel.json line. What am I missing?

rdlrt commented 3 years ago

@profd2004 That's because the genesis being referred to in the "latest" builds are in preparation for Alonzo, and the optional parameter ShelleyGenesisHash in your config.json will not match newer genesis (even tho other other parameters are skipped). You dont need to worry about the newer build config/genesis just yet.

sambor81 commented 3 years ago

it's started sync again, but i think it stack again on epoch 267 [51.4%] 25minutes and increasing. Umbelivable how buggy this cod is. I even copy blockchain from a running server and dident work too. Do you have any clue or idea guys how to solve this issue. thank you so much for help

erikd commented 3 years ago

Umbelivable how buggy this cod is.

What other bugs are you facing? Have you raised tickets for any of them?

Yes, we handled changes to the confg file poorly. That is something we would like to improve.

mrbrinker commented 3 years ago

Are you sure it is stuck? We are about 51,6% into epoch 267

sambor81 commented 3 years ago

Umbelivable how buggy this cod is.

What other bugs are you facing? Have you raised tickets for any of them?

Yes, we handled changes to the confg file poorly. That is something we would like to improve.

I had issues with 1.27.0 node. after3 days of fight looks like everything works. so happy

erikd commented 3 years ago

I had issues with 1.27.0 node.

What issues? Did you raise a ticket? We cannot fix what we do not know about.

sambor81 commented 3 years ago

I had issues with 1.27.0 node.

What issues? Did you raise a ticket? We cannot fix what we do not know about.

I dident rised the ticket, I was looking for help and information on google. The main issue was that 1.27.0 node stopped sync after 14.9%. after swapping files and restarting node. I could not switch the node and when finaly was runnimg after few attempts, just start disappearing and showing again back for 2-3 secends in system monitor. After completely new instaletion of cardano-node started works.

profd2004 commented 3 years ago

I had issues with 1.27.0 node.

What issues? Did you raise a ticket? We cannot fix what we do not know about.

@erikd Here is my specific issue that is oddly only happening with my test node. I'm using the same build for my test and mainnet node but only the test node is not syncing; hence I thought it was maybe something with the config.

I'm getting the exact issue on my remote cloud node as well as my local home node, at the same slotNo.

[2021-05-23 17:44:21.60 UTC] IP 195.154.69.26:3003 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (HeaderError (At (Block {blockPointSlot = SlotNo 1598400, blockPointHash = 02b1c561715da9e540411123a6135ee319b02f60b9a11a603d3305556c04329f})) (HeaderProtocolError (HardForkValidationErrFromEra S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (VRFKeyBadNonce (Nonce "81e47a19e6b29b0a65b9591762ce5143ed30d0261e5d24a3201752506b20f15c") (SlotNo 1598400) (Nonce "74a0665cf3990f72fea801a670dfddb9efa4815d13bc0779bab26f15e42fcf46") (CertifiedVRF {certifiedOutput = OutputVRF {getOutputVRFBytes = "t\231\145\196\165ZhA\137S\209{Z<1\194\225]Yq\235\&7#!\161:\147\129Q\236x\207\195z\170\155\182mw\141\182\135\249\209\178\134\&3_:\167b\135\204\&4\205Z\172\230\163\226\EM\DC2\226\182"}, certifiedProof = CertPraosVRF "!\202\180:L)*\DC2\250\SOH\141V \240Z\EOT\n\183\245\141\SUB\191\ETXQ\"\EOT\155A\SOH'\160LD\253\204Z\249\129/i\178\237p\155\140\240\142\183)LG\137q\248\DLE\DC1\130W\183\162\149\DEL6=[5\225/18\154\192=\242\255\181\f\190\190\t"}))]})))) (Tip (SlotNo 1598339) c24fa5ebf4d88a653a6762773278b05e68a9da08fdcabac0f58abafdeedd049d (BlockNo 1597072)) (Tip (SlotNo 27422626) fc0213330f879136967b77a56c2694810ec4cf58925e9e6cfafdd61aa53690c7 (BlockNo 2608001))))) 200s 200s

Should I create an issue somewhere else?

erikd commented 3 years ago

I'm using the same build for my test and mainnet node but only the test node is not syncing; hence I thought it was maybe something with the config.

So this is a test node that runs on mainnet or testnet? If it runs on mainnet I would be very interested in a diff between the configs of the working and non-working nodes.

Can I assume that you are running the same git checkout versions?

jnardiello commented 3 years ago

Unfortunately I was in the same situation. Node sync stopped at 14.4%. Trying to restart the service after updating the config files, returns this error:

May 23 23:17:43 blockproducer systemd[1]: Started Cardano node service.
May 23 23:17:44 blockproducer cardano-node[1384972]: Error decoding genesis at: /root/cardano-node/mainnet-shelley-genesis.json Error: Error in $: key "adaPerUTxOWord" not found
May 23 23:17:44 blockproducer systemd[1]: cardano-node.service: Main process exited, code=exited, status=1/FAILURE
May 23 23:17:44 blockproducer systemd[1]: cardano-node.service: Failed with result 'exit-code'.

Any idea? Doing a diff on the config files, it seems that adaPerUTxOWord was removed intentionally, but cardano-node is complaining. Any help would be much appreciated.

erikd commented 3 years ago

Node sync stopped at 14.4%.

How are you measuring that?

jnardiello commented 3 years ago

How are you measuring that?

gLiveView

Adding the missing key back to the config file and cardano-node fails with this new error:

May 23 23:41:46 blockproducer systemd[1]: Started Cardano node service.
May 23 23:41:47 blockproducer cardano-node[1426490]: Wrong Shelley genesis file: the actual hash is "67f10d861ce22c2afcdc48cb20dc1014e7b2ca97319ac945326ccf3425a5a2e7", but the expected Shelley genesis hash given in the node configuration file is "1a3be38bcbb7911969283716ad7aa550250226b76a61fc51cc9a9a35d9276d81"
May 23 23:41:47 blockproducer systemd[1]: cardano-node.service: Main process exited, code=exited, status=1/FAILURE
May 23 23:41:47 blockproducer systemd[1]: cardano-node.service: Failed with result 'exit-code'.

It seems like cardano-node is complaining the config file has changed?

Edit: Starting from scratch with the new config file works, you just can't continue syncing if the config file has changed (per my understanding, might be wrong)

erikd commented 3 years ago

It seems like cardano-node is complaining the config file has changed?

Yeah, config changed to support upcoming Alonzo features. We (IOHK/IOG) probably need to handle dissemination of these files in a better way. They should be locked to a release version instead of whatever is latest on master.

Starting from scratch with the new config file works, you just can't continue syncing if the config file has changed

Some changes will not be a problem and others will. Resyncing is probably the best option.

jnardiello commented 3 years ago

@erikd I'm currently re-syncing with latest cardano-node and updated config files as mentioned in this issue. Let's see if this solves the issue 🤞

I totally agree that config files should be locked to release, especially if they are subject to change. Thank you a lot for your help!

disassembler commented 3 years ago

I'll get the release manager to add links in the release notes. We're a little spoiled with nix. It handles all the dependencies down to even the config file versions so we never run into the issue of mismatching config files in our deployments since we point to the tagged commit.

profd2004 commented 3 years ago

So this is a test node that runs on mainnet or testnet? If it runs on mainnet I would be very interested in a diff between the configs of the working and non-working nodes. Can I assume that you are running the same git checkout versions?

Yes, same git version. Here is my setup: I build a docker image in a ci/cd pipeline push it to a test environment where it runs on testnet mounting in test configs and db then I push the same image to a production environment where I run it on the mainnet with mainnet config and db.

Locally I've always ran and developed against the testnet which was working until I rebooted for 1.26.1. Since then, only production/mainnet works.

erikd commented 3 years ago

We are moving fast and unfortunately we are breaking things. We need to keep moving fast, but make sure things don't get broken.

erikd commented 3 years ago

@maierfelix You say:

Node version: 1.27.0 (git rev 69c77dc)

However, the 1.27.0 tag is not at commit 69c77dc (which is a commit on master).

If you are building a node to run on mainnnet, you should never build from master. You should always build from the tag for the latest release (which is currently 1.27.0).

profd2004 commented 3 years ago

...I would be very interested in a diff between the configs of the working and non-working nodes...

@erikd here is that diff:

config json-diff
erikd commented 3 years ago

@profd2004 One is a testnet config and the other is mainnet. They are different. They are not compatible.

profd2004 commented 3 years ago

@profd2004 One is a testnet config and the other is mainnet. They are different. They are not compatible.

@erikd, yes they are not compatible. the mainnet confit is running on a mainnet and it works just great.

The testnet config is running on a tesnet node, were I am having issues. Both environments running the same 1.27.0 build.

None of my testnet nodes are working.

erikd commented 3 years ago

@profd2004 Ok, testnet may be busted. Please raise a separate ticket about that.

bruceharrison1984 commented 3 years ago

I am seeing this same issue when running from the cardano-node:1.27.0 Docker image for a db-sync node.

erikd commented 3 years ago

@bruceharrison1984 is that testnet or mainnet?

bruceharrison1984 commented 3 years ago

My apologies, it is main-net but a different error. I had to wait for the node to sync again to get the error to appear. You can disregard my +1.

For the sake of completion, this is the error I received:

[ef7402f1:cardano.node.DnsSubscription:Error:34372] [2021-05-31 03:02:07.56 UTC] Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 54.215.120.53:3001 InvalidBlock (At (Block {blockPointSlot = SlotN
o 30847244, blockPointHash = 4638c1fcb92e1ec81fe95188dbc2dbdd35f37f7b6a2e214ef8304548b0c41159})) (InFutureExceedsClockSkew (RealPoint (SlotNo 30847244) 4638c1fcb92e1ec81fe95188dbc2dbdd35f37f7b6a2e214ef8304548b0c41159)

Which is totally unrelated to this topic.

rae89 commented 3 years ago

I may be having this same issue. I am trying to sync the cardano-node using the docker container. I can get up to the Allegra era, and then eventually the docker container exits. And when I try starting the docker container up again, the node.socket is not found in the /ipc directory anymore, where previous to the docker crashing it was available. Here is how the output looks like when I restart the docker, and it just hangs here before the container exits again:

Starting: /nix/store/b4hj6i49x89762mllqlqznmsa6n12wsh-cardano-node-exe-cardano-node-1.27.0/bin/cardano-node run --config /nix/store/r6ygkc694c1vfhikdx4dhsqwkim7gds0-config-0.json --database-path /data/db --topology /nix/store/mb0zb61472xp1hgw3q9pz7m337rmfx7f-topology.yaml --host-addr 127.0.0.1 --port 3001 --socket-path /ipc/node.socket

+RTS -N2 -A16m -qg -qb --disable-delayed-os-memory-return -RTS ..or, once again, in a single line: /nix/store/b4hj6i49x89762mllqlqznmsa6n12wsh-cardano-node-exe-cardano-node-1.27.0/bin/cardano-node run --config /nix/store/r6ygkc694c1vfhikdx4dhsqwkim7gds0-config-0.json --database-path /data/db --topology /nix/store/mb0zb61472xp1hgw3q9pz7m337rmfx7f-topology.yaml --host-addr 127.0.0.1 --port 3001 --socket-path /ipc/node.socket +RTS -N2 -A16m -qg -qb --disable-delayed-os-memory-return -RTS Listening on http://127.0.0.1:12798 [61b4a33f:cardano.node.networkMagic:Notice:5] [2021-06-04 17:23:46.59 UTC] NetworkMagic 764824073 [61b4a33f:cardano.node.basicInfo.protocol:Notice:5] [2021-06-04 17:23:46.59 UTC] Byron; Shelley [61b4a33f:cardano.node.basicInfo.version:Notice:5] [2021-06-04 17:23:46.59 UTC] 1.27.0 [61b4a33f:cardano.node.basicInfo.commit:Notice:5] [2021-06-04 17:23:46.59 UTC] 8fe46140a52810b6ca456be01d652ca08fe730bf [61b4a33f:cardano.node.basicInfo.nodeStartTime:Notice:5] [2021-06-04 17:23:46.59 UTC] 2021-06-04 17:23:46.5987889 UTC [61b4a33f:cardano.node.basicInfo.systemStartTime:Notice:5] [2021-06-04 17:23:46.59 UTC] 2017-09-23 21:44:51 UTC [61b4a33f:cardano.node.basicInfo.slotLengthByron:Notice:5] [2021-06-04 17:23:46.59 UTC] 20s [61b4a33f:cardano.node.basicInfo.epochLengthByron:Notice:5] [2021-06-04 17:23:46.59 UTC] 21600 [61b4a33f:cardano.node.basicInfo.slotLengthShelley:Notice:5] [2021-06-04 17:23:46.59 UTC] 1s [61b4a33f:cardano.node.basicInfo.epochLengthShelley:Notice:5] [2021-06-04 17:23:46.59 UTC] 432000 [61b4a33f:cardano.node.basicInfo.slotsPerKESPeriodShelley:Notice:5] [2021-06-04 17:23:46.59 UTC] 129600 [61b4a33f:cardano.node.basicInfo.slotLengthAllegra:Notice:5] [2021-06-04 17:23:46.59 UTC] 1s [61b4a33f:cardano.node.basicInfo.epochLengthAllegra:Notice:5] [2021-06-04 17:23:46.59 UTC] 432000 [61b4a33f:cardano.node.basicInfo.slotsPerKESPeriodAllegra:Notice:5] [2021-06-04 17:23:46.59 UTC] 129600 [61b4a33f:cardano.node.basicInfo.slotLengthMary:Notice:5] [2021-06-04 17:23:46.59 UTC] 1s [61b4a33f:cardano.node.basicInfo.epochLengthMary:Notice:5] [2021-06-04 17:23:46.59 UTC] 432000 [61b4a33f:cardano.node.basicInfo.slotsPerKESPeriodMary:Notice:5] [2021-06-04 17:23:46.59 UTC] 129600 [61b4a33f:cardano.node.addresses:Notice:5] [2021-06-04 17:23:46.59 UTC] [SocketInfo 127.0.0.1:3001] [61b4a33f:cardano.node.diffusion-mode:Notice:5] [2021-06-04 17:23:46.59 UTC] InitiatorAndResponderDiffusionMode [61b4a33f:cardano.node.dns-producers:Notice:5] [2021-06-04 17:23:46.59 UTC] [DnsSubscriptionTarget {dstDomain = "relays-new.cardano-mainnet.iohk.io", dstPort = 3001, dstValency = 1}] [61b4a33f:cardano.node.ip-producers:Notice:5] [2021-06-04 17:23:46.59 UTC] IPSubscriptionTarget {ispIps = [], ispValency = 0}

sloik commented 3 years ago

Is there an official solution? Nod version that is working on the main net? :)