esphome / issues

Issue Tracker for ESPHome
https://esphome.io/
290 stars 36 forks source link

PN532 Library Does not handle safe reboot correctly (i2c) #3823

Closed 0x90ChuckTessela closed 1 year ago

0x90ChuckTessela commented 1 year ago

The problem

When running a PN532 over I2C, the device functions after power loss/restoration, and every other soft reboot.

When it is provided power, and every second soft reboot, it provides the following during setup:

[C][pn532:018]: Setting up PN532...
[VV][i2c.arduino:140]: 0x24 TX 0000FF02FED4022A00
[V][pn532:246]: Reading ACK...
[VV][i2c.arduino:116]: 0x24 RX 01
[VV][i2c.arduino:116]: 0x24 RX 010000FF00FF00
[V][pn532:257]: ACK valid: YES
[V][pn532_i2c:043]: Reading response
[VV][i2c.arduino:116]: 0x24 RX 01
[VV][i2c.arduino:116]: 0x24 RX 010000FF06FAD5
[V][pn532:262]: Sending NACK for retransmit
[VV][i2c.arduino:140]: 0x24 TX 0000FFFF0000
[V][pn532_i2c:049]: Reading response of length 5
[VV][i2c.arduino:116]: 0x24 RX 01
[VV][i2c.arduino:116]: 0x24 RX 010000FF06FAD50332010607E800
[D][pn532:033]: Found chip PN532
[D][pn532:034]: Firmware ver. 1.6
[VV][i2c.arduino:140]: 0x24 TX 0000FF05FBD4140114010200

However, if rebooting the host device via the "restart" platform, or if something causes the device to quickly reboot without loss of power, the card reader fails to setup. In these cases, it provides the following:

[C][pn532:018]: Setting up PN532...
[VV][i2c.arduino:140]: 0x24 TX 0000FF02FED4022A00
[V][pn532:246]: Reading ACK...
[VV][i2c.arduino:116]: 0x24 RX 01
[VV][i2c.arduino:116]: 0x24 RX 010000FF02FED5
[V][pn532:257]: ACK valid: NO
[E][pn532:022]: Error sending version command
[E][component:113]: Component pn532 was marked as failed.

The primary source of issue appears to be during the PN 532 I2C setup routine, when it requests the device's firmware. When it succeeds, it get a "valid" ack, 010000FF00FF00. When it fails, it receives an "incorrect" ack from the device: 010000FF02FED5

However, this is the way the device appears to perform after a safe reboot of the host. I believe the library may need to be modified to handle the above response and continue on, instead of only allowing the one valid ack when starting up to handle safe reboots where the I2C device has not lost power.

I have tested this on an ESP32-POE-ISO, with a bare-bones I2C install, and nothing else interfacing with the board.

Which version of ESPHome has the issue?

2022.11.1

What type of installation are you using?

Home Assistant Add-on

Which version of Home Assistant has the issue?

2022.11.4

What platform are you using?

ESP32

Board

esp32-poe-iso

Component causing the issue

pn532_i2c

Example YAML snippet

pn532_i2c:
  update_interval: 0.7s
  id: "${node_id}_nfc_rfid_unit"

Anything in the logs that might be useful for us?

No response

Additional information

No response

May be indirectly related to #https://github.com/esphome/issues/issues/3281 ?

0x90ChuckTessela commented 1 year ago

Just to clarify, adding an on-device automation to safe reboot the host when the above issue is encountered causes the device to reboot once again, and then the device is appropriately setup and working.

It really seems to be a problem with safe reboots when the I2C device does not lose power.

For reference, the following will reboot the host once and then the i2c device will work

esphome:
  on_boot:
    priority: 550
    then:
      - lambda: |-
          if(id(${node_id}_nfc_rfid_unit).is_failed()){
            ESP_LOGE("PN532_i2c", "Module did not respond as expected. Restarting.");
            delay(100);  // NOLINT
            App.safe_reboot();
          }

pn532_i2c:
  update_interval: 0.7s
  id: "${node_id}_nfc_rfid_unit"

this is obviously not a solution, but does demonstrate the issue.

gmcmicken commented 1 year ago

Thanks for the insight, I'm having this issue as well and I'm hoping to have success with your workaround.

tracestep commented 1 year ago

I am also getting this issue. I tried changing the code for the pn532 component to accept the 010000FF02FED5 ACK, but it failed later while trying to getting the version data. I then commented out the part that marks the component failed if it did not get the version data, and it worked. It seems the chip or the component is able to recover from that afterwards.

I am reluctant to make a pull request since it's kind of a kludge, but if you want to make the change (just use a local external component - https://esphome.io/components/external_components.html), you have to edit components/pn532/pn532.cpp:

void PN532::setup() {
  ...
  std::vector<uint8_t> version_data;
  if (!this->read_response(PN532_COMMAND_VERSION_DATA, version_data)) {
    ESP_LOGE(TAG, "Error getting version");
    //this->mark_failed();  // it can recover from that, no need to give up
    return;
  }

  ...

bool PN532::read_ack_() {
  ESP_LOGV(TAG, "Reading ACK...");

  std::vector<uint8_t> data;
  if (!this->read_data(data, 6)) {
    return false;
  }

  bool matches = (data[1] == 0x00 &&                     // preamble
                  data[2] == 0x00 &&                     // start of packet
                  data[3] == 0xFF && data[4] == 0x00 &&  // ACK packet code
                  data[5] == 0xFF && data[6] == 0x00);   // postamble
  bool matches_reset = (data[1] == 0x00 &&                     // preamble
                        data[2] == 0x00 &&                     // start of packet
                        data[3] == 0xFF && data[4] == 0x02 &&  // ACK packet code
                        data[5] == 0xFE && data[6] == 0xD5);   // postamble
  ESP_LOGV(TAG, "ACK valid: %s", YESNO(matches || matches_reset));
  if (matches_reset) ESP_LOGD(TAG, "Got reset ACK");
  return matches || matches_reset;
}

It might be a good idea to comment out a few other this->mark_failed() as well, I don't think a single wrong ACK or unexpected response is reason to give up and never try again, potentially locking someone out if the reader is used in some access control device.

EDIT: I turns out that if you simply comment-out the first two this->mark_failed() in the PN532::Setup() function (lines 23 and 30), there is no need to mess with the read_ack_() function. It will throw an error, but the reader will work.

tracestep commented 1 year ago

I found a more robust solution: when setting up the component, if PN532 responds to PN_COMMAND_VERSION_DATA with an invalid ack, just try one more time. It will then respond with a valid ack and be able to get the version and initialize correctly:

void PN532::setup() {
  ESP_LOGCONFIG(TAG, "Setting up PN532...");

  // Get version data
  if (!this->write_command_({PN532_COMMAND_VERSION_DATA})) {
    ESP_LOGW(TAG, "Error sending version command, trying again...");
    if (!this->write_command_({PN532_COMMAND_VERSION_DATA})) {
      ESP_LOGE(TAG, "Error sending version command");
      this->mark_failed();
      return;
    }
  }

  ...

I will make a pull request once I get some other problems with this component sorted out.

gmcmicken commented 1 year ago

I found a more robust solution: when setting up the component, if PN532 responds to PN_COMMAND_VERSION_DATA with an invalid ack, just try one more time. It will then respond with a valid ack and be able to get the version and initialize correctly:

void PN532::setup() {
  ESP_LOGCONFIG(TAG, "Setting up PN532...");

  // Get version data
  if (!this->write_command_({PN532_COMMAND_VERSION_DATA})) {
    ESP_LOGW(TAG, "Error sending version command, trying again...");
    if (!this->write_command_({PN532_COMMAND_VERSION_DATA})) {
      ESP_LOGE(TAG, "Error sending version command");
      this->mark_failed();
      return;
    }
  }

  ...

I will make a pull request once I get some other problems with this component sorted out.

Maybe wrap it in a loop and define the number of retries?

tracestep commented 1 year ago

There is no need. For this issue ("incorrect" ACK every other hot-reboot), one retry is enough, at least on my setup.