Agoric / agoric-sdk

monorepo for the Agoric Javascript smart contract platform
Apache License 2.0
305 stars 191 forks source link

Bad address - cannot read snapshot for v10:zoe - KERNEL PANIC #3877

Closed dckc closed 2 years ago

dckc commented 2 years ago

reported in https://discord.com/channels/585576150827532298/819073555446759444/880169455412457602 and https://discord.com/channels/585576150827532298/819073555446759444/880176218673131570

Aug 25 21:18:16 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: 2021-08-25T19:18:16.617Z launch-chain: Launching SwingSet kernel
Aug 25 21:18:16 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614214]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-qSM9Ip.xss: Bad address
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: 2021-08-25T19:18:40.070Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: portHandler threw (ExitCode#1)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: ExitCode#1: v10:zoe exited: I/O error
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at ChildProcess.emit (events.js:400:28)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Scheduled restart job, restart counter is at 2.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: Stopped Agoric Cosmos daemon.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: Started Agoric Cosmos daemon.
dckc commented 2 years ago

I'm working on getting the relevant contents of xs-shapshots submitted for forensic analysis.

EmreNOP commented 2 years ago

Aug 26 06:06:48 agoric ag-chain-cosmos[16025]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/3c12d2556f426be51c154bd623dc8d4aeb5c56d9e80f254a632ff55f14bf7c26-load-qVodTc.xss: Bad address

have same issue.

kj89 commented 2 years ago

same here Aug 26 08:13:58 agoric-validator ag-chain-cosmos[1097]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/0a5139af6d5c231df506b10f6c18e856caade55a2cbe4566d6a8822c22b00381-load-JDg594.xss: Bad address

humantraffic commented 2 years ago

I uploaded slog file and the folder of snapshots as requested

https://www.dropbox.com/s/ycz0cbwe2f58yk0/humantraffic-agorictest17-chain.slog.gz?dl=0 https://www.dropbox.com/s/qwegikzcfo19l90/xs-snapshots.tar.gz?dl=0

alkadeta commented 2 years ago

I have the same issue after performing my latest restart task.

Aug 26 07:38:02 agoric ag-chain-cosmos[272306]: cannot read snapshot /home/***/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/0662072aa8268afa30d4805714dba5b02b52e542e15115b5bea678b4076f482d-load-g145I9.xss: Bad address

Slog File: https://drive.google.com/file/d/1XPOp4l8HzE5hKJxzipLEd-6URxzDZVOd/view xs-snapshots folder: https://drive.google.com/file/d/1ex8HH8F4M51X2h-xeqRBaOeZxbQ3rBY4/view

MarryRSR commented 2 years ago

THE SAME: (

Aug 26 10:46:57 Ubuntu-2004-focal-64-minimal systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 10:46:57 Ubuntu-2004-focal-64-minimal systemd[1]: Started Agoric Cosmos daemon.
Aug 26 10:47:00 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: 2021-08-26T08:47:00.399Z launch-chain: Launching SwingSet kernel
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247751]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/33faedb936ea43ccaa5fc4c84a1848f2bcdd5b953225a7274ea78027125a3259-load-XPsN2L.xss: Bad address
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: 2021-08-26T08:47:38.832Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: portHandler threw (ExitCode#1)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: ExitCode#1: v10:zoe exited: I/O error
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]:   at ChildProcess.emit (events.js:400:28)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
absorberch commented 2 years ago

Same here, I have got it on my RPC node

Aug 26 10:47:28 Agoric-RPCnode ag-chain-cosmos[1037588]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/2b3a7bd6674097027e4094b27e52a01382161337c8ee7a968dda4b48feb4ef9f-load-8WdaNm.xss: Bad address

https://www.dropbox.com/s/7uu886yow5e1ds5/nataagoricRPC-xs-snapshots.tar.gz?dl=0 https://www.dropbox.com/s/ng35gtkcs66s643/nataagoricRPC-agorictest17-chain.gz?dl=0

Caneryy commented 2 years ago

I have same issue.

smilby commented 2 years ago

Hello, after restart i have this error too.

This is full logs https://docs.google.com/spreadsheets/d/1H2QPSKDmtv2b5wUn8EVz2CI4S8Lo2cDQq-XABAlzy8A/edit?usp=sharing

Aug 26 03:15:27 ubuntu-8gb-hel1-agoric ag-chain-cosmos[799734]: 2021-08-26T01:15:27.277Z block-manager: block 71157 begin
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: Stopping Agoric Cosmos daemon...
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=98/n/a
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: Started Agoric Cosmos daemon.
Aug 26 03:15:48 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: 3:15AM ERR WARNING: The minimum-gas-prices config in app.toml is set to the empty string. This defaults to 0 in the current version, but will error in the next version (SD>
Aug 26 03:15:50 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: 2021-08-26T01:15:50.960Z launch-chain: Launching SwingSet kernel
Aug 26 03:15:51 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Logging sent error stack (RemoteError(error:liveSlots:v14#70257)#769)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769: already have remote (a string)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error: already have remote (a string)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at construct ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at Error ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at makeError ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at fullRevive ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at unserialize ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at notifyOnePromise ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at notify ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at dispatchToUserspace ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at runWithoutMetering ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769 ERROR_NOTE: Rejection from: (Error#770) : 2147 . 0
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769 ERROR_NOTE: Rejection from: (Error#771) : 2146 . 1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769 ERROR_NOTE: Sent as error:liveSlots:v8#70257
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error#770: Event: 2146.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error: Event: 2146.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at construct ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at Error ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at trackTurns ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at handle ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at pleaseProvision ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at pleaseProvision ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at win ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error#770 ERROR_NOTE: Caused by: (Error#771)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error#771: Event: 2145.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error: Event: 2145.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at construct ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at Error ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at trackTurns ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at handle ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at deliver ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at dispatchToUserspace ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at runWithoutMetering ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400659]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/52ab799b661522074f91c8ea3d6bf2282ff8e8c0e818db5ed720b8d88ca947f3-load-9bRFzP.xss: Bad a>
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: 2021-08-26T01:16:28.203Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: portHandler threw (ExitCode#1)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: ExitCode#1: v10:zoe exited: I/O error
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]:   at ChildProcess.emit (events.js:400:28)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 26 03:16:31 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Scheduled restart job, restart counter is at 1.
Aug 26 03:16:31 ubuntu-8gb-hel1-agoric systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 03:16:31 ubuntu-8gb-hel1-agoric systemd[1]: Started Agoric Cosmos daemon.
Aug 26 03:16:32 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400670]: 3:16AM ERR WARNING: The minimum-gas-prices config in app.toml is set to the empty string. This defaults to 0 in the current version, but will error in the next version (SD>
lines 5854-5915
alipostaci2001 commented 2 years ago

image

i have this error! here is my xs-snapshots file link: https://disk.yandex.com.tr/d/l70acR2IuO2ENw

donperenjon87 commented 2 years ago

I have an error Bad address on the advice of the admin from the Discord, I publish here the files from the folder xs-snapshots https://drive.google.com/drive/folders/1XDXGJ_8iMqi8kq6MX_lhK0D7fRtHkE3c?usp=sharing

image_2021-08-26_16-01-36

jjangg96 commented 2 years ago
1|ag-chain-cosmos  | 12:08PM INF starting ABCI with Tendermint
1|ag-chain-cosmos  | 12:08PM INF Starting multiAppConn service impl=multiAppConn module=proxy
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=query impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=mempool impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=consensus impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting EventBus service impl=EventBus module=events
1|ag-chain-cosmos  | 12:08PM INF Starting PubSub service impl=PubSub module=pubsub
1|ag-chain-cosmos  | 12:08PM INF Starting IndexerService service impl=IndexerService module=txindex
1|ag-chain-cosmos  | 12:08PM INF ABCI Handshake App Info hash="\x13\x04��S�N�Tt�\n\x1f+/�*\x10��}�u���\t�\x06���" height=74589 module=consensus protocol-version=0 software-version=0.26.15
1|ag-chain-cosmos  | 12:08PM INF ABCI Replay Blocks appHeight=74589 module=consensus stateHeight=74589 storeHeight=74590
1|ag-chain-cosmos  | 12:08PM INF Replay last block using real app module=consensus
1|ag-chain-cosmos  | 12:08PM INF minted coins from module account amount=387656ubld from=mint module=x/bank
1|ag-chain-cosmos  | 2021-08-26T12:08:29.898Z launch-chain: Launching SwingSet kernel
1|ag-chain-cosmos  | cannot read snapshot /home/ubuntu/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/1a5809881eb945b953d1b8a3325b2a4032302143205e368e0505877bcd9eca9b-load-QOByIT.xss: Bad address
1|ag-chain-cosmos  | 2021-08-26T12:09:10.893Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
1|ag-chain-cosmos  | portHandler threw (ExitCode#1)
1|ag-chain-cosmos  | ExitCode#1: v10:zoe exited: I/O error
1|ag-chain-cosmos  |   at new ErrorCode (packages/xsnap/api.js:49:5)
1|ag-chain-cosmos  |   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
1|ag-chain-cosmos  |   at ChildProcess.emit (events.js:400:28)
1|ag-chain-cosmos  | Cannot initialize Controller ExitCode: v10:zoe exited: I/O error

https://drive.google.com/file/d/1PA9iun7nPk11EuaMsRIiAHXfYEnHBhpa/view?usp=sharing

Syd-ai commented 2 years ago

Hello,

I also had the same issue commented here.

1) My node was doing fine but i did a restart and then the issue happened. Stuck in the restart loop mentionned above, with the Kernal Panic error. 2) Reset and re synced from scratch took a long time but fixed the issue 3) Then I restarted again to do my last restart task and same issue happened again.

You can find here my xs-snapshots files : https://www.dropbox.com/s/qyz5clq5osc2wa2/xs-snapshots.zip?dl=0

bakarapara commented 2 years ago

gor same error. node failed to restart

full log - https://pastebin.com/7Y3J6tnN

krisboit commented 2 years ago

Hello, Same issue here, after a restart got the Panic error message

my xs-snapshots files are here: https://drive.google.com/file/d/1y76aeF4C_29wMkCmCkhqNkYgilA0YKPz/view?usp=sharing

dckc commented 2 years ago

For each task where you are struggling due to an issue beyond your control (such as this one), go ahead and fill out the task in the knack portal before the deadline, and include the URL of this issue https://github.com/Agoric/testnet-notes/issues/33 to explain why you're having trouble.

If you later accomplish the task, just submit again.

dckc commented 2 years ago

Thank you, @humantraffic , @krisboit, @alipostaci2001 ; I managed to download your xs-snapshots directories.

I probably don't need any more, but thanks, everybody!

mrixl commented 2 years ago

i restart my node and such an error appeared ``` ag-chain-cosmos[11381]: cannot read snapshot /home/mirxl/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/b382c9e68a0e18d73280298d1aeda03ec76346b43cbc8146e5f940a568cf8062-load-cWHBIP.xss: Bad address

dckc commented 2 years ago

Thanks, @mrixl ... but for others coming here, I don't think we need more logs that look pretty much the same. Feel free to just :+1: the issue or something, and as I say in https://github.com/Agoric/testnet-notes/issues/33#issuecomment-906471131 , cite the URL of this issue in knack portal submissions.

dckc commented 2 years ago

@warner do want the whole ag-chain-cosmos state directory?

sshamanov commented 2 years ago

Same issue

Aug 26 15:55:26 agoric ag-chain-cosmos[217104]: cannot read snapshot /home/agoric/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/80b929bd4566ec950ee6db3dbb77bf8a6e8cf950285b4bb74928f6e92599b0a7-load-u0gfub.xss: Bad address Aug 26 15:55:26 agoric ag-chain-cosmos[217041]: 2021-08-26T12:55:26.978Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####

https://disk.yandex.ru/d/Y34U2pWR9F3IOg https://disk.yandex.ru/d/fg6799Jdt2JsOA

ColinkaMir commented 2 years ago

I have the same problem

Aug 26 19:45:42 Agoric ag-chain-cosmos[1934360]: portHandler threw (ExitCode#1)
Aug 26 19:45:42 Agoric ag-chain-cosmos[1934360]: ExitCode#1: v10:zoe exited: I/O error
Aug 26 19:45:42 Agoric ag-chain-cosmos[1934360]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 26 19:45:42 Agoric ag-chain-cosmos[1934360]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 26 19:45:42 Agoric ag-chain-cosmos[1934360]:   at ChildProcess.emit (events.js:400:28)
Aug 26 19:45:42 Agoric ag-chain-cosmos[1934360]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 26 19:45:42 Agoric systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 19:45:42 Agoric systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 26 19:45:46 Agoric systemd[1]: ag-chain-cosmos.service: Scheduled restart job, restart counter is at 10.
Aug 26 19:45:46 Agoric systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 19:45:46 Agoric systemd[1]: Started Agoric Cosmos daemon.
Aug 26 19:45:49 Agoric ag-chain-cosmos[1934608]: 2021-08-26T17:45:49.224Z launch-chain: Launching SwingSet kernel
Aug 26 19:45:49 Agoric ag-chain-cosmos[1934608]: Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
Aug 26 19:46:13 Agoric ag-chain-cosmos[1934821]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/f8ca9d82a597e2a732797fd5f93a6787b86623ac002e0f850465a712c29756ee-load-je50GL.xss: Bad address
Aug 26 19:46:13 Agoric ag-chain-cosmos[1934608]: 2021-08-26T17:46:13.574Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
dckc commented 2 years ago

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

asifhj commented 2 years ago

Could not complete restart task due to Bad address issue. image

edwardmorra-btc commented 2 years ago

Hey! I have the same issue. I also had the same issue commented here. After restarting task it got stuck with a Kernel panic error discussed here and in our discord. Below you can find my logs attached, hope that helps. Thank you for your time!

https://www.dropbox.com/s/er4lmj3jlfc8fo7/log.txt?dl=0

UPD: https://www.dropbox.com/s/5cj1sq33g5nde0p/xs-snapshots.zip?dl=0

aditya-manit commented 2 years ago

+1, Looks like a popular issue 😛 😛

humantraffic commented 2 years ago

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

yeah, np. https://drive.google.com/file/d/1n_EnE9Juhxq30MLIKpwNd3MENw6uM6CE/view?usp=sharing

kalpatech-team commented 2 years ago

xs-snapshots from validator : https://drive.google.com/file/d/1nXYdzru5Eq_dzyi5FFOhL0TPtWU5yZ-Z/view?usp=sharing

Syd-ai commented 2 years ago

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

Here you go

https://drive.google.com/file/d/1QoiLuAvlh9x5prb01KJ6Lk3ARNvJ7lRF/view?usp=sharing

Happy investigation 🙏

sshamanov commented 2 years ago

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

https://disk.yandex.ru/d/7CChawS92qeyVw

HaneTrudie commented 2 years ago

xs-snapshots from validator https://drive.google.com/file/d/1dJebr26uj4_pOvaYBIKvG5ZhXFGj21MZ/view?usp=sharing https://drive.google.com/file/d/15qPuzs3qFAk5ZfZ_V1JD6s0SQHzzja3g/view?usp=sharing

niocris commented 2 years ago

Hello, exactly the same problem, my post https://github.com/Agoric/testnet-notes/issues/38

dckc commented 2 years ago

Thanks. It looks like I have a couple full node state backups now.

jupyter@slog45nb:~$ ls -lR dx-collect/33-panic/
dx-collect/33-panic/:
total 8
drwxr-xr-x 2 jupyter jupyter 4096 Aug 27 20:00 Syd-ai
drwxr-xr-x 2 jupyter jupyter 4096 Aug 27 18:29 humantraffic

dx-collect/33-panic/Syd-ai:
total 13578524
-rw-r--r-- 1 jupyter jupyter 13904400441 Aug 27 19:58 ag-chain-cosmos-SYD.zip

dx-collect/33-panic/humantraffic:
total 12596552
-rw-r--r-- 1 jupyter jupyter 12898861668 Aug 27 18:15 ag-chain-cosmos.tar.gz

p.s. I think object storage a better fit for .tar.gz files...

jupyter@slog45nb:~$ gsutil -m rsync -r dx-collect/ gs://slogfile-upload-5/dx-collect/

WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
Copying file://dx-collect/33-panic/Syd-ai/ag-chain-cosmos-SYD.zip [Content-Type=application/zip]...
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

Copying file://dx-collect/33-panic/humantraffic/ag-chain-cosmos.tar.gz [Content-Type=application/x-tar]...
| [2/2 files][ 25.0 GiB/ 25.0 GiB] 100% Done  81.8 MiB/s ETA 00:00:00           
Operation completed over 2 objects/25.0 GiB. 
jennyhys commented 2 years ago

same problem happened to our node as well

jennyhys commented 2 years ago

@dckc any idea what might went wrong? Should I share my xs-snapshot here as well?

dckc commented 2 years ago

I can reproduce the symptoms by trying to load the snapshot into one of our tools:

connolly@jambox:~/projects/agoric/agoric-sdk/packages/xsnap$ ./moddable/build/bin/lin/release/xsnap -r ~/Downloads/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-7IUrAb.xss 
cannot read snapshot /home/connolly/Downloads/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-7IUrAb.xss: Bad address

I'm struggling to come up with a more detailed diagnosis. I have reached out to our collaborators at Moddable for help.

p.s. @warner it does not look like a case of deleting a snapshot too early. The compressed snapshot is there in the contributed diagnostic materials and the uncompressed snapshot.

It's a little interesting that we don't delete the uncompressed snapshot in this error case. I don't think that was by design, but it's somewhat fortunate in this case.

dckc commented 2 years ago

sdf

Using the swingset-tools branch (https://github.com/Agoric/agoric-sdk/commit/7f7fb5125) I was able to replay the first few deliveries:

jupyter@slog45nb:~/agoric-sdk$ git describe --tags --always
agorictest-17-101-g7f7fb5125
jupyter@slog45nb:~/agoric-sdk$ git branch
  master
* swingset-tools

jupyter@slog45nb:~/33-panic$ wc transcript-v10.sst 
   255844  32108212 433442037 transcript-v10.sst

jupyter@slog45nb:~/33-panic$ node ~/agoric-sdk/packages/SwingSet/misc-tools/replay-transcript.js transcript-v10.sst 
argv [ 'transcript-v10.sst' ]
using transcript transcript-v10.sst
creating xsnap helper bundles
xs bundles written
xsnap helper bundles created
manager created
delivery 3: ["message","o+0",{"method":"buildZoe","args":{"body":"[{\"@qclass\":\"slot\",\"iface\":\"Alleged: vatAdminService\",\"index\":0},{\"assetKind\":\"nat\",\"displayInfo\":{\"assetKind\":\"nat\",\"decimal
delivery 4: ["notify",[["p-60",false,{"body":"{\"@qclass\":\"slot\",\"iface\":\"Alleged: timerService\",\"index\":0}","slots":["o-51"]}]]]
...
delivery 23: ["dropExports",["o+20"]]
anachrophobia strikes vat v10
delivery completed with 3 expected syscalls remaining
expected: {"0":"dropImports","1":{"0":"o-63","length":1},"length":2}
expected: {"0":"retireImports","1":{"0":"o-63","length":1},"length":2}
expected: {"0":"retireExports","1":{"0":"o+20","length":1},"length":2}
RUN ERR (Error#1)
Error#1: historical inaccuracy in replay of v10
  at Object.finishReplayDelivery (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/vatManager/transcript.js:91:23)
  at Object.replayOneDelivery (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/vatManager/manager-helper.js:176:23)
  at processTicksAndRejections (node:internal/process/task_queues:96:5)
  at async replay (file:///home/jupyter/agoric-sdk/packages/SwingSet/misc-tools/replay-transcript.js:171:7)
  at async run (file:///home/jupyter/agoric-sdk/packages/SwingSet/misc-tools/replay-transcript.js:191:3)

earlier episode:

replay tool crashed: Cannot read property 'unmetered' of undefined

Ouch... now what? hm.

jupyter@slog45nb:~/33-panic$ node ~/agoric-sdk/packages/SwingSet/bin/replay-transcript.js transcript-v10.sst 
argv [ 'transcript-v10.sst' ]
replay-one-vat.js transcript.sst
using transcript transcript-v10.sst
RUN ERR (TypeError#1)
TypeError#1: Cannot read property 'unmetered' of undefined
  at build (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/liveSlots.js:416:45)
  at makeLiveSlots (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/liveSlots.js:1173:13)
  at Object.createFromBundle (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/vatManager/manager-local.js:108:16)
  at replay (file:///home/jupyter/agoric-sdk/packages/SwingSet/bin/replay-transcript.js:132:31)
  at processTicksAndRejections (node:internal/process/task_queues:96:5)
  at async run (file:///home/jupyter/agoric-sdk/packages/SwingSet/bin/replay-transcript.js:170:3)

version info

jupyter@slog45nb:~/33-panic$ node --version
v16.6.1
jupyter@slog45nb:~/agoric-sdk$ git describe --tags --always
@agoric/access-token@0.4.13-27-g44cd72f8e

How the log file was extracted

jupyter@slog45nb:~/33-panic$ node ~/agoric-sdk/packages/SwingSet/bin/extract-transcript-from-slogfile.js humantraffic-agorictest17-chain.slog.gz v10 > ,out 2> ,err

jupyter@slog45nb:~/33-panic$ wc transcript-v10.sst 
   255844  32108212 433442037 transcript-v10.sst

jupyter@slog45nb:~/33-panic$ ls ~/agoric-sdk/packages/SwingSet/bin/ extract-transcript-from-kerneldb.js extract-transcript-from-slogfile.js rekernelize replay-transcript.js vat