100+ Consecutive fails on some nodes

sadgb commented 5 years ago

After investigating those nodes i was unable to find any clues. CPU usage low Memory usage less than 50% Same configuration as others, make restart didn't help Server restart didn't help

The only thing i was able to find in logs is

chainpoint-node | 2019-01-20T12:39:21.196654386Z WARN : Calendar : Could not retrieve block range 25735 (blocks 2573500 to 2573599) ...... chainpoint-node | 2019-03-07T12:04:20.243424875Z WARN : Calendar : Could not retrieve block range 28231 (blocks 2823100 to 2823199) chainpoint-node | 2019-03-13T06:44:41.190138103Z WARN : Calendar : Could not retrieve block range 28511 (blocks 2851100 to 2851199)

The problem starts in batches. For example i have some nodes with 320-350 consecutive faileds and a batch with 738-800 faileds

Please tell me how to find more info or how to fix this

jacohend commented 5 years ago

Hi @sadgb, thanks for reaching out. Could you send us your node IP and node version? This will help us debug the issue. You can send to jacob@tierion.com if you don't want your information public on github.

sadgb commented 5 years ago

all of them are 1.5.4 I've sent your an email with details

michael-iglesias commented 5 years ago

Hello @sadgb,

Can you please provide us with more complete account of what is being logged within the Nodes suffering from consecutive failure. We've noted in the log output pasted above that there is a considerable block range gap between the first and last reported failures: block range 28231 & block range 28511, respectively.

A more verbose snapshot of log output and steps that you've taken to try to remedy the issue will give us a bit more insight into what is going on.

Feel free to post requested info here in this thread or email either jacob@tierion.com or miglesias@tierion.com.

Thanks, Michael I.

sadgb commented 5 years ago

so i run

docker-compose logs > 1-1.txt

on one of my nodes 80.211.216.147

File attached. If you can give some instructions of how to gather more data, please share that knowledge

сб, 16 мар. 2019 г. в 03:29, Michael Iglesias notifications@github.com:

Hello @sadgb https://github.com/sadgb,

Can you please provide us with more complete account of what is being logged within the Nodes suffering from consecutive failure. We've noted in the log output pasted above that there is a considerable block range gap between the first and last reported failures: block range 28231 & block range 28511, respectively.

A more verbose snapshot of log output and steps that you've taken to try to remedy the issue will give us a bit more insight into what is going on.

Feel free to post requested info here in this thread or email either jacob@tierion.com or miglesias@tierion.com.

Thanks, Michael I.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chainpoint/chainpoint-node-src/issues/29#issuecomment-473472139, or mute the thread https://github.com/notifications/unsubscribe-auth/AAoXwzjPzgxeDPqe1UiIWcAYpDiCjHMrks5vXCzUgaJpZM4bzYnJ .

sadgb commented 5 years ago

also i would like to mention that nu,ber of problem nodes increased a bit

michael-iglesias commented 5 years ago

@sadgb I think you may have forgotten to attach the file.

chainpoint / chainpoint-gateway

100+ Consecutive fails on some nodes #29