ethereum-mining / ethminer

Ethereum miner with OpenCL, CUDA and stratum support
GNU General Public License v3.0
5.96k stars 2.28k forks source link

the pool is regularly disconnected #1539

Closed Zaqsnaider closed 5 years ago

Zaqsnaider commented 6 years ago

Hello. I'm using the version ethminer-0.16.0rc1 and windows10. And I have the same problem of regular disconnects and rejects on etp coin - dodopool or metaverse.farm. I run the following bat file:

@echo off timeout /t 45 :bg ethminer -U -P stratum://WALLET.NAME:x@nl.metaverse.farm:3002 --exit --tstop 72 --tstart 45 --noeval timeout /t 10 goto :bg

default

Thanks.

AndreaLanfranchi commented 6 years ago

Actually this problem has nothing to do with ethminer. You see "Connection remotely closed by" ... ?

This can be caused by any of these :

  1. Poor internet connection
  2. Absence of static public IP address on your internet connection (your IP gets changed)
  3. Aggressive behavior of your router/firewall which prematurely detects idle connections
  4. Problem on pool's side which abruptly closes connections
Zaqsnaider commented 6 years ago

Thank you for your prompt reply. I have been mining for a long time. I like Ethminer very much and I would like to use it always. I would not ask such question - as other miner gives out approximately on 2000 decisions only 4 rejekts and the hashrate on a pool corresponds. While ethminer receives a regular discount and dozens of rejects-on 2000 solutions about 20 rejects. By your answer:

  1. Internet connection was not lost.
  2. I have a static Internet address and it's fine, too.
  3. This refutes the behavior of another miner.
  4. Here I can say that I tried three pools - (dodo, sand and farm) and everywhere the same behavior, and on ETC and ETH this was not. ETP I started mining only recently. I would like to send a screen with another miner on the same machine where everything is in order later. What do you advise to do to narrow down the cause of the problem? It may be possible to make a debug log - if explain how. Thanks.
AndreaLanfranchi commented 6 years ago

The cause of rejects have to be investigated. I see you have set the cli argument --noeval which prevents the CPU to re-validate solution found by GPU. I'd suggest to remove it temporarily and run a batch of 4~6 hours. If you see several messages about GPU giving incorrect result then it's likely you're overclocking too much. Please note OC values are not universal : what is fine for another miner might be too much for ethminer or vice versa (all miners implement very different ways to invoke GPU work).

Zaqsnaider commented 6 years ago

I send a screen of another miner-a few hours everything is fine. phoenix

And immediately launched ethminer. Immediately visible problems. Acceleration and --noeval removed completely. ethmine_1 ethmine_2 ethmine_3 ethmine_4

Is it possible that the pool accepts ethminer or something else as an attack and disconnects?

AndreaLanfranchi commented 6 years ago

Is it possible that the pool accepts ethminer or something else as an attack and disconnects?

Highly unlikely Will try to run a batch now

AndreaLanfranchi commented 6 years ago

I've run a batch of 30 minutes and got several disconnections too. Specifically every time a Stale solution gets submitted the remote end (the pool) drops connection. Not to mention the fact the submission time vary greatly in a range from 60ms to 800ms

I'd suggest to get in touch with pool devs to inspect the problem. Ethminer on other pools works smoothly.

Zaqsnaider commented 6 years ago

For much time this was the first such strange case. I am very grateful for your help. I'll contact pool support.

Zaqsnaider commented 6 years ago

Another farm 4588+1578 shows a record of stability. ethminer 0.16.0rc1 and win10 ethminer -G -P stratum://WALLET.Name:x@nl.metaverse.farm:3004 --exit --tstop 72 --tstart 40 extra

Zaqsnaider commented 6 years ago

Sorry - 4rx588+1rx578

Zaqsnaider commented 6 years ago

Still I will tell that on other farms with such problem there are only Nvidia.

ddobreff commented 6 years ago

It happens when pool doesn't send jobs too often, usually small pools cause this disconnects. Switched to smaller pool on purpose and experienced the same behaviour. What Andrea Lanfranchi mentioned about "--response-timeout" did the trick, depend on how often pool sends jobs you need to increase value from 2-30s. and even more.

[2018-09-05 11:10:10][info][miner]:  X 11:10:08 stratum  No response received in 10 seconds.
[2018-09-05 11:10:10][info][miner]:  i 11:10:08 main     Disconnected from eu-eth.hiveon.net [3.120.72.4:4444]
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 main     Suspend mining due connection change...
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 main     No more connections to try. Exiting...
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 main     Shutting down miners...
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-1   No work. Pause for 3 s.
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-7   No work. Pause for 3 s.
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-5   No work. Pause for 3 s.
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-4   No work. Pause for 3 s.
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-3   No work. Pause for 3 s.
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-6   No work. Pause for 3 s.
[2018-09-05 11:10:10][info][miner]:  i 11:10:09 cuda-2   No work. Pause for 3 s.
[2018-09-05 11:10:13][info][miner]:  i 11:10:12 cuda-7   No work. Pause for 3 s.
[2018-09-05 11:10:13][info][miner]:  i 11:10:12 cuda-5   No work. Pause for 3 s.
[2018-09-05 11:10:13][info][miner]:  i 11:10:12 cuda-4   No work. Pause for 3 s.
[2018-09-05 11:10:13][info][miner]:  i 11:10:12 cuda-3   No work. Pause for 3 s.
[2018-09-05 11:10:13][info][miner]:  i 11:10:12 cuda-6   No work. Pause for 3 s.
[2018-09-05 11:10:13][info][miner]:  i 11:10:12 cuda-2   No work. Pause for 3 s.
Zaqsnaider commented 6 years ago

With this, the disconnects from the pool are gone. ethminer -U -P stratum://Wallet.Name@nl.metaverse.farm:3002 --noeval --report-hashrate --response-timeout 60 --work-timeout 300

chfast commented 6 years ago

Two comments:

  1. If we drop the connection because of inactivity, maybe the log message should be other than "Connection remotely closed".
  2. Can we increase the default value for --work-timeout?
jean-m-cyr commented 6 years ago

Why would changing --response-timmeout or --work-timeout cause a "connection remotely closed" to disappear? I think that error message comes from the stack, so I don't see how client side timeouts would be the cause!

Or, does boost::asio::error::eof not necessarily mean "connection remotely closed"? Seems so: "An error code of boost::asio::error::eof indicates that the connection was closed by the peer."

Just checked, either of the --response-timmeout or --work-timeout timers would have issued a specific error log on expiring.

chfast commented 6 years ago

@jean-m-cyr The issue has been resolved by increasing some timeouts, see https://github.com/ethereum-mining/ethminer/issues/1539#issuecomment-419228315.

I'm guessing that when one of the timeout is hit, the ethminer disconnects from the pool. If I'm right, then the log message "connection remotely closed" is incorrect and users don't have a clue how to solve the problem. It would be much easier if the log said "disconnected from pool because of inactivity (response timeout)".

I'm also wander if increasing the default value have any impact on the overall performance. If not we could increase them to make ethminer work with default values in this case.

AndreaLanfranchi commented 6 years ago

I do not agree. Actually I see in the thread a couple of misunderstandings.

  1. Connection remotely closed by ... have nothing to do with any value of timeout we can set. This is catched by a very specific condition if (ec == boost::asio::error::eof) which is a clear declaration the remote party have sent an "End of transmission".

  2. Every time we hit a timeout we output the proper log message like No response in ...

I strongly believe the problem depicted on this thread is strictly related to the weakness of the pool which (I am only guessing) has suboptimal load balancing techniques or faulty pool implementation.

All tests on Rate A pools have never depicted such a situation.

AndreaLanfranchi commented 6 years ago

The increase of timeouts, IMHO, has produced some effects only coincidentally

AndreaLanfranchi commented 6 years ago

As @jean-m-cyr correctly underlined ... if the disconnection was on our side boost would have returned the "Operation Aborted" error code which is also trapped with a different output message.

AndreaLanfranchi commented 6 years ago

I'm also wander if increasing the default value have any impact on the overall performance. If not we could increase them to make ethminer work with default values in this case.

No it does not affect ethminer performance in any way. Only problem you may stay connected longer on a non-responsive pool.

urpils commented 5 years ago

If you look the timestamps, you will see the disconnects are exactly 60 seconds after last job send AND no solution found. So it is a pool decision for disconnecting "inactive" clients. The diff on port 3004 is too high for the miner, try port 3002. bildschirmfoto 2018-09-22 um 16 23 04

AndreaLanfranchi commented 5 years ago

@urpils and other.

Opening post for this thread depicts connection on port 3002 which causes problems.

This said if this statement by you is true

the disconnects are exactly 60 seconds after last job send AND no solution found. So it is a pool decision for disconnecting "inactive" clients

the pool is behaving pretty badly. As MTP has a block time of 24 seconds (avg) in 60 seconds the POOL should send at least 2~3 jobs. If it doesn't and it computes the missing jobs as idle time ... well blame pool maintainers ... not ethminer.

AndreaLanfranchi commented 5 years ago

Closing