StorjOld / dataserv-client

Client for storing and auditing data.
http://storj.io
MIT License
56 stars 24 forks source link

Invalid authentication headers - Synchronize your clock #178

Closed vindicatorr closed 8 years ago

vindicatorr commented 8 years ago
dataserv-client[5456]: 2015-11-26 05:14:49,786 WARNING dataserv_client.messaging 61: Invalid authentication headers. Synchronize your clock with an internet-based time server.
dataserv-client[5456]: 2015-11-26 05:14:49,790 INFO dataserv_client.messaging 75: Query retry in 30 seconds.
dataserv-client[5456]: 2015-11-26 05:14:51,149 INFO storjnode.network.protocol 46: finding neighbors of

My server's clock is correct. I even compared it to time.gov.

littleskunk commented 8 years ago

Please connect to my testserver and contact me in slack (@ skunk). Logfile will tell us whats going on.

vindicatorr commented 8 years ago

Ugh, ok I was having other storage issues. I set --max_size 2T and at some point was getting storage-full notifications in my logs (from various processes I think) when I was at 1.7T for storj. I think there was still 500G left though, maybe it was MB. I don't remember. Anyway, I DO remember seeing the clock warning when there was still storage remaining. Then I had changed my config to --min_free_size 100G and restarted the service. When I woke up, I saw there was literally NO storage left and df still showed 1.7T for the storj folder. I would have expected storj to trim/purge, but that did not happen.

When I rebooted my server and the service ran starting at 00:44:23 with "Log opened". ntpdate adjusted time at 00:44:26 The clock warning occurred at 00:44:45

I manually stopped the service and deleted 112G of storj store files starting with the oldest. Then when I restarted the storj service, it was starting at Height 1. In that case, I stopped the service, deleted all files in the data folder and am starting fresh.

I created my Slack account and am currently running both my storj service and another instance to your url and so far no clock warnings in either my nor your process. Since I ssh to my server and don't want your process to be dropped, I'm trying to run the command within "(... 2>&1 | tee skunk.log) &". After exiting ssh, I see in my other ssh session that your process is still running. A query to your url occurred at 01:56:(17 & 24) CST.

I don't know if I should open another Issue regarding file manipulation or corruption, but you should probably take into account what happens when files in the "store" are deleted/renamed/edited... as well as keeping an eye on storage used (because of other processes) anytime a new store file is going to be created. For example, my TV can record a program and the storage drops below the min_free_size. Not only should storj not create a file, but it should trim/purge.

littleskunk commented 8 years ago

I close this bug because your problem with the invalid authentication header is solved. I can see your instance on my logfile. I will keep my server running for ~10h but i dont think the messages comes again.

Next time dont delete the old shards. You have to delete the new shards. If something goes wrong you can use the --repair argument. It will rebuild missing or corrupted shards. (Missing in the README at the moment. I will fix that later.)

The free space calculation should work if other programs writes files in the background. I think the 1.7 TB are the real problem. But that looks like a OS problem. I found this: http://askubuntu.com/questions/79981/df-h-shows-incorrect-free-space

If you have still problems with free space open a new issure please.

vindicatorr commented 8 years ago

Ah, wonderful. I'm seeing that WARNING in my log again.

Nov 27 19:30:47 <server> dataserv-client[1850]: 2015-11-27 19:30:47,593 INFO storjnode.network.protocol 110: got response from <sanitized>, adding to router
Nov 27 19:30:49 <server> dataserv-client[1850]: 2015-11-27 19:30:49,093 WARNING dataserv_client.messaging 61: Invalid authentication headers. Synchronize your clock with an internet-based time server.
Nov 27 19:30:49 <server> dataserv-client[1850]: 2015-11-27 19:30:49,107 INFO dataserv_client.messaging 75: Query retry in 30 seconds.
Nov 27 19:30:52 <server> dataserv-client[1850]: 2015-11-27 19:30:52,508 CRITICAL twisted 154: 'Did not received reply for msg id <sanitized> within 5 seconds'

And this is perhaps 6 hours after I rebooted my server. I went ahead and restarted my test client to url again, with the first query going out at 22:44:49 CST, though it looks like you have stopped that service (BadStatusLine). I'll keep it running though.

littleskunk commented 8 years ago

I can see the invalid authentication headers in the logfile. Only a few are coming. I gues your clock is 14 sec out of sync. If the connection is to slow you will get the magic message. At the moment it is no problem for you. After 30 sec dataserv-client will try again. You have 5 minutes to send one ping. I did not see that much Invalid authentication headers in the logfile.

For some reason i did not see the time differenz on the logfile. I will add that later and write you a slack message.

vindicatorr commented 8 years ago

I had just rebooted my server again (trying to work with openssl 1.1 and bitcoin compilation). grepping "your clock" in syslog gave me 119 entries starting from 04:41:47 to 03:58:10 (about 23 hours). Again, I don't see how my server's time is out of sync at all. I look at time.gov and then run "date" on the server and it isn't off 1 second. And with a ping of 27ms, speeds of 24mbps/2mbps, I don't see that affecting times. EDIT:I'm also getting a 500 server error from your qnap now, and then dataserv borks.

littleskunk commented 8 years ago

Bug reopen. I added some log outputs and set the timeout to 0 to see the timedifference every ping. We both checked the system time with time.gov. We have no time difference but the server log shows 7 sec difference. Here is the result.

2015-11-28 22:41:05,896 WARNING main 84: Invalid header date 0:00:14.896466 >= 0:00:00! 2015-11-28 22:41:42,931 WARNING main 84: Invalid header date 0:00:07.931242 >= 0:00:00! 2015-11-28 22:42:19,755 WARNING main 84: Invalid header date 0:00:06.755609 >= 0:00:00! 2015-11-28 22:42:56,715 WARNING main 84: Invalid header date 0:00:07.715423 >= 0:00:00! 2015-11-28 22:43:41,286 WARNING main 84: Invalid header date 0:00:15.286848 >= 0:00:00! 2015-11-28 22:44:18,695 WARNING main 84: Invalid header date 0:00:07.695172 >= 0:00:00! 2015-11-28 22:44:55,561 WARNING main 84: Invalid header date 0:00:07.561789 >= 0:00:00! 2015-11-28 22:45:40,586 WARNING main 84: Invalid header date 0:00:15.586437 >= 0:00:00! 2015-11-28 22:46:17,459 WARNING main 84: Invalid header date 0:00:07.459179 >= 0:00:00! 2015-11-28 22:47:02,276 WARNING main 84: Invalid header date 0:00:15.276647 >= 0:00:00! 2015-11-28 22:47:39,553 WARNING main 84: Invalid header date 0:00:07.553481 >= 0:00:00!

The log prints out 7-14 sec difference. I gues it is the cpu or network connection that needs ~7 sec.

vindicator [16:24] Sure. And if you're curious of the server hardware (processor) I use: http://www.hardkernel.com/main/products/prdt_info.php?g_code=G143452239825 I'm only using 4 of the LITTLE cores (and not the 4 big) since I built a newer kernel which can't make use of the HMP or whatever. I learned about that issue in a forum thread I opened. Work is being done to better handle varying core sizes.

I will add some more log output for the client. I would like to see how long the clients needs to generate the authentication.

littleskunk commented 8 years ago

Some time information from the client log. As you can see the cpu power is the problem. With no other tasks running on that system it needs 6-7 sec to generate the authentication header. With normal usage (running some other tasks) it sometimes needs more than 20 sec to generate the authentication header.

Nov 28 11:01:54 Query: http://status.driveshare.org/api/address generated in 0:00:00.000080 Nov 28 11:02:18 Query: http://status.driveshare.org/api/register/ generated in 0:00:24.059545 Nov 28 11:02:54 Query: http://status.driveshare.org/api/register/ generated in 0:00:06.380941 Nov 28 11:03:12 Query: http://status.driveshare.org/api/height/ generated in 0:00:13.871135 Nov 28 11:03:31 Query: http://status.driveshare.org/api/ping/ generated in 0:00:17.074207 Nov 28 11:04:39 Query: http://status.driveshare.org/api/ping/ generated in 0:00:07.522602 Nov 28 11:05:59 Query: http://status.driveshare.org/api/ping/ generated in 0:00:16.61092

@F483 and @super3 Higher timeout (30sec) or is there a way we can speed up the authentication?

vindicatorr commented 8 years ago

Not sure if this is related to timing/performance:

Nov 28 11:39:41 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:41,073 ERROR twisted 455: 'Did not received reply for msg id <sanitized a> within 5 seconds'
Nov 28 11:39:41 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:41,074 ERROR twisted 455: 'Did not received reply for msg id <sanitized b> within 5 seconds'
Nov 28 11:39:41 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:41,084 ERROR twisted 455: 'Did not received reply for msg id <sanitized c> within 5 seconds'
Nov 28 11:39:41 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:41,085 ERROR twisted 455: 'Did not received reply for msg id <sanitized d (repeat c)> within 5 seconds'
Nov 28 11:39:46 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:46,093 ERROR twisted 455: 'Did not received reply for msg id <sanitized e> within 5 seconds'
Nov 28 11:39:46 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:46,105 ERROR twisted 455: 'Did not received reply for msg id <sanitized f (repeat e)> within 5 seconds'
Nov 28 11:39:46 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:46,106 ERROR twisted 455: 'Did not received reply for msg id <sanitized g> within 5 seconds'
Nov 28 11:39:46 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:46,105 ERROR twisted 455: 'Did not received reply for msg id <sanitized h (repeat g)> within 5 seconds'
Nov 28 11:39:46 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:46,120 ERROR twisted 455: 'Did not received reply for msg id <sanitized i> within 5 seconds'
Nov 28 11:39:46 <sanitized> dataserv-client[3876]: 2015-11-28 11:39:46,123 ERROR twisted 455: 'Did not received reply for msg id <sanitized j (repeat i)> within 5 seconds'

(and to knit-pick, "Did not received reply" should be "Did not receive reply" or "Reply not received for msg")

littleskunk commented 8 years ago

f483 found a faster signature library. It has low priority. For the moment you have to live with some invalid authentication header messages. I will ask for a higher timeout.

littleskunk commented 8 years ago

I changed the timout from 20 sec to 30 sec as workaround but you have to wait for the next dataserv (server) version.

vindicatorr commented 8 years ago

No worries. I'm in the process of updating my kernel (native build on target) and couldn't use fakeroot because I didn't have SYSV IPC selected when I originally cross-compiled the kernel. So, temporarily, I installed the 3.10 kernel which didn't have the HMP core limit and now all 8 cores are being used and each is hovering ~30% with bitcoind the primary consumer. I'm still going to upgrade to 4.4 though which I'm sure will cause the loss of those 4 big cores again.

littleskunk commented 8 years ago

Performance enhancement will come with https://github.com/F483/btctxstore/issues/12