bropat / eufy-security-client

This shared library allows to control Eufy security devices by connecting to the Eufy cloud servers and local/remote stations over p2p.
MIT License
467 stars 67 forks source link

Sometimes some commands fail with ERROR_COMMAND_TIMEOUT #51

Closed fuatakgun closed 2 years ago

fuatakgun commented 2 years ago

Hi @bropat ,

Do you think, we can add a simple or exponential retrial mechanism into eufy-security-client? I have seen many warnings in docker output as below and retrial might help us to resolve this.

2021-10-12 14:09:22.993  WARN Station T8010NXXX - Result data for command not received 
{
  message: {
    sequence: 43,
    command_type: 1145,
    nested_command_type: undefined,
    channel: 1,
    data: <Buffer d1 00 00 2b 58 5a 59 48 79 04 88 00 00 00 01 00 01 00 00 00 01 00 00 00 00 00 00 00 64 30 64 61 38 35 64 35 62 61 31 38 33 66 32 37 37 63 33 64 37 65 ... 106 more bytes>,
    retries: 1,
    acknowledged: true,
    return_code: -133,
    timeout: Timeout {
      _idleTimeout: 20000,
      _idlePrev: null,
      _idleNext: null,
      _idleStart: 584022,
      _onTimeout: [Function (anonymous)],
      _timerArgs: undefined,
      _repeat: null,
      _destroyed: false,
      [Symbol(refed)]: true,
      [Symbol(kHasPrimitive)]: false,
      [Symbol(asyncId)]: 11886,
      [Symbol(triggerId)]: 112
    }
  }
}
bropat commented 2 years ago

@fuatakgun Is already present (look above):

retries: 1,      <==== this is a counter; max retries = 10

MAX_RETRIES

fuatakgun commented 2 years ago

So, it is retrying up to 10 and sending back the response whenever it can get a response, correct?

bropat commented 2 years ago

@fuatakgun Exactly.

bropat commented 2 years ago

@fuatakgun

I have now analysed the above error and the problem in this case is as follows:

  1. we send a P2P command with the protocol sequence 43.
  2. we receive an OK from the station that it has received the command with the sequence 43 (acknowledged: true).
  3. we wait for the result of the command with sequence 43. here we should receive an answer after a certain time with the return code of the command (whether it was executed without error or not). If we do not receive an answer within 20 seconds, the above error occurs.

The problem here could be that we get the answer after the above timeout or that the UDP packets get lost on the way. I can increase the timeout, but the higher the timeout, the longer we wait for an "internal" response of the command.

This timeout error also occurs from time to time with @rpitera, now that I am running tests with his environment. I am trying to understand where exactly the problem lies. I'm sure I'll have to do some network traces...

bropat commented 2 years ago

I am currently trying to replicate the problem. It does not always occur... It could be some race condition in the code...

bropat commented 2 years ago

@fuatakgun

One difference between the official app and my integration is that the app only sends one command at a time to the station, not several at once.

Example:

Perhaps the solution to the problem could be to run a send queue that is processed sequentially and only sends 1 command after the other, once the result (or the command times out) of the previous command has been received.

rpitera commented 2 years ago

Perhaps the solution to the problem could be to run a send queue that is processed sequentially and only sends 1 command after the other, once the result (or the command times out) of the previous command has been received.

This makes a lot of sense to me. And btw, welcome back - hope you enjoyed your holiday!

bropat commented 2 years ago

@rpitera

And btw, welcome back - hope you enjoyed your holiday!

I enjoyed it very much. Maybe a little short, but better than nothing ;)

With the next version (1.2.0) that I would like to release today, I should have this problem under control :)

rpitera commented 2 years ago

So have you determined the issue with having a number of cams connected to one homebase? Cause my older Arlo hub is starting to fail so I was considering replacing them with a couple of smaller Eufycams and that's a concern for me.

bropat commented 2 years ago

@rpitera

So have you determined the issue with having a number of cams connected to one homebase? Cause my older Arlo hub is starting to fail so I was considering replacing them with a couple of smaller Eufycams and that's a concern for me.

I have implemented the following which seems to solve the problem:

Perhaps the solution to the problem could be to run a send queue that is processed sequentially and only sends 1 command after the other

Please also check it yourself. If the problem occurs again, just give feedback here. If I don't hear anything after a few days, I will close this issue.

rpitera commented 2 years ago

Thanks for taking the time to explain it; also thanks so much for the support and fix!

I guess I have to wait for @fuatakgun to implement the new code downstream, but as soon as he does I will test it and report back to you one way or the other. Again, my thanks.

fuatakgun commented 2 years ago

i am not seeing this anymore, but sometimes p2p streaming just stuck and i am not able to start it without restarting docker instance. i call start p2p livestream and ws throws an error saying that 5 seconds left and nothing arrived. Not sure if we should investigate this separately or together with timeouts

bropat commented 2 years ago

i am not seeing this anymore, but sometimes p2p streaming just stuck and i am not able to start it without restarting docker instance. i call start p2p livestream and ws throws an error saying that 5 seconds left and nothing arrived. Not sure if we should investigate this separately or together with timeouts

@fuatakgun

This is a timeout I introduced if for some reason no more livestream data arrives to stop the stream internally. Without this timeout it can happen in some cases that the library still thinks that a stream is running even if it is not.

In short, this timeout has nothing to do with the other P2P timeout error ;)

fuatakgun commented 2 years ago

it is clear @bropat , thanks. i will create a separate issue to discuss stuck p2p case.