Concordium / Testnet3-Challenges

This repo is dedicated to Concordium Incentivized Testnet3.
https://developers.concordium.com/
Apache License 2.0
325 stars 471 forks source link

Node Reseted #684

Closed ZaferGraph closed 3 years ago

ZaferGraph commented 3 years ago

Hello, My B1 server has stopped today after 3 days later. This is what I seen

resim

As you see there is no any error, I checked the kern.log there is no any too:

(I sent you the file)

Then I tried to concordium-node-retrieve-logs but it gave me empty log file as you see:

resim

I tried to take log to times but same:

resim

After all, I restarted the node for see what block it stopped, but what I see:

resim

Node has totally reseted

I'm sorry can't give you log because it's empty.

But I'll send you (testnet@concordium.com) kern.log if you want to see.

concordium-cl commented 3 years ago

Thanks for the detailed info. I have received your log. We will look into it.

ab-concordium commented 3 years ago

Can you provide a bit more description of your system?

One thing that might be relevant is if you run docker container list and post the output of that. It might tell us what happened to the node and why the logs are not accessible.

ZaferGraph commented 3 years ago

resim

resim

resim

ZaferGraph commented 3 years ago

I can pm ip and pass if you want, I'm thinking destroy already, no problem.

ab-concordium commented 3 years ago

Interesting. Based on the limited allocated RAM I suspect that your node got killed due to trying to allocate more than 512MB.

Is 3 days ago also when you started the node the first time? Can you run docker logs 24e96815a720?

ZaferGraph commented 3 years ago

Yes I can run, here: https://paste.ubuntu.com/p/MtQZKj4Tn3/

Yeah there it is, probably cache ram not enough because when I checked ram and disk after kill there both under usage. Yes the system is not enough but this reaction should't be like this, at least node stay stopped not purged.

Also my B2 killed same reason I think, same system spec. I just missed one step, this one happent both when I run the start command.

resim

I just started with a new server and node with these spec yesterday, let's see what's going on

resim

ab-concordium commented 3 years ago

Those specs should be sufficient now.

From the logs you just attached it looks like the cause of failure was actually lack of disk space, which led to partial database corruption, and thus your node could not restart. I can't see precisely what happened to the container you were running, but seeing that docker containers have some space overhead, it is likely that either it was partially corrupt, or deleted, which caused the behaviour you see.

iekmuby commented 3 years ago

@ZaferGraph what is your zone in Vultr for this nodes? Silicon Valley?

ZaferGraph commented 3 years ago

@ZaferGraph what is your zone in Vultr for this nodes? Silicon Valley?

It was NY, currently I'm in Netherland

iekmuby commented 3 years ago

Silicon Valley area is very unstable. One of my nodes was stopped and erased, so I lost all of my logs. I suppose this may happened with NY too. By the way, Amsterdam has one incident too, since I'm use it in Concordium.