SmartThingsCommunity / SmartThingsEdgeDrivers

Apache License 2.0
268 stars 452 forks source link

Aqara FP2: Retry to get token, better logging #1699

Closed ldeora closed 1 week ago

ldeora commented 1 week ago

This is how the log looks like when the device isn't ready yet to answer REST API requests:

2024-09-27T22:47:45.440063966Z ERROR Aqara Presence Sensor  get_credential : ip = 192.168.1.121, failed to get token, error = wantread
2024-09-27T22:47:45.445295882Z WARN Aqara Presence Sensor  credential is nil
2024-09-27T22:47:45.450356841Z ERROR Aqara Presence Sensor  failed to get credential. dni= 54:EF:44:5C:4C:98, ip= 192.168.1.121

We don't know how the firmware works but maybe it just needs a bit more time? Here's how the driver gets the token:

https://github.com/SmartThingsCommunity/SmartThingsEdgeDrivers/blob/ea0625ddf6adceb26799dfc5cf05aa2ad4b9a5ec/drivers/Aqara/aqara-presence-sensor/src/fp2/api.lua#L70-L79

Can we change this to retry a few times? Maybe like this?

local socket = require "cosock.socket"  -- Required to use socket.sleep()

function fp2_api.get_credential(device_ip, socket_builder)
  local retry = retry_fn(5)
  local response, error, status

  repeat
    -- Attempt to get the credential (auth code)
    response, error, status = process_rest_response(
      RestClient.one_shot_get(get_base_url(device_ip) .. "/authcode", ADDITIONAL_HEADERS, socket_builder)
    )

    -- If the request was successful, return the token
    if not error and status == 200 then
      return response  -- Token returned on successful request
    end

    -- Log the failed attempt
    log.warn(string.format("get_credential : ip = %s, failed to get token, error = %s, retrying...", device_ip, error))

    -- Sleep before retrying, if more retries are available
    if retry() then
      socket.sleep(1)  -- Delay between retries
    else
      -- Final failure: all retries exhausted
      log.error(string.format("get_credential : ip = %s, failed to get token after retries, error = %s", device_ip, error))
      break
    end
  until not retry()

  return nil  -- Return nil if all retries failed
end

With this change the driver should be more resilient against timing issues while retrieving the auth token from the device.

What do you think, @seojune79 ?

ldeora commented 1 week ago

We could also do something like this:

local function retry_get_request(url, headers, socket_builder, max_retries, delay)
  local retries = 0
  local response, error, status
  repeat
    response, error, status = RestClient.one_shot_get(url, headers, socket_builder)
    if error == "wantread" then
      log.warn("Retrying due to 'wantread' error")
      socket.sleep(delay)
      retries = retries + 1
    elseif error then
      log.error(string.format("Error: %s", error))
      break
    end
  until not error or retries >= max_retries
  return response, error, status
end

function fp2_api.get_credential(device_ip, socket_builder)
  local url = get_base_url(device_ip) .. "/authcode"
  local response, error, status = retry_get_request(url, ADDITIONAL_HEADERS, socket_builder, 5, 2)  -- Retry 5 times with a 2-second delay
  if not error and status == 200 then
    return response
  else
    log.error(string.format("get_credential: ip = %s, failed to get token, error = %s", device_ip, error))
  end
end

This way, we could use the new function retry_get_request() in all cases where RestClient.one_shot_get() is used. The name one_shot_get implies that it's really only one try.

ldeora commented 1 week ago

The retry logic is in the lunchbox...