espressif / esp-protocols

Collection of ESP-IDF components related to networking protocols
198 stars 135 forks source link

How to retry PPP on failure? (IDFGH-7471) #44

Closed bobobo1618 closed 2 years ago

bobobo1618 commented 2 years ago

I'm using a SIM7600 modem and I'm largely following the pattern for setup and retry set by the AP to PPPoS example (note: the pppos_client example has no retry logic at all that I can see).

When the modem disconnects, I see:

And then nothing. No disconnect bit, no connect bit, no IP events, nothing.

If it helps I can try to make a minimal repro example but I wonder if some documentation could be added for how this is meant to work?

bobobo1618 commented 2 years ago

I used the AP to PPPoS example to demonstrate that reconnect doesn't work. Here's my branch of this repository with a few changes I made to make the example run on my hardware:

Here's log output from the device a few minutes later. You can see the initial connection failure and after that, silence.

david-cermak commented 2 years ago

Hi @bobobo1618

Thanks for reporting this problem. I will take a look at it and try to debug this.

My guess is that the recovery "sequence":

https://github.com/espressif/esp-protocols/blob/7346ed9765b9ceafc0cbcf5c70a80fee4fd6a21b/components/esp_modem/examples/ap_to_pppos/main/ap_to_pppos.c#L156-L159

fails in the first step, i.e. to exit the data mode (for some reason, possibly since most devices simply fallback to the command mode on disconnection) and we never check the return value, so the next step (resume data mode) fails on precondition (we're already in that mode) and we just keep waiting for another flag.

If you'd like to test this, you can try to modify the state machine in esp_modem_dce.cpp:

 if (!device->set_mode(modem_mode::COMMAND_MODE)) { 
+     mode = modem_mode::UNDEF:
     return false; 
 }

to mark the current state as undefined, if the transition fails.

bobobo1618 commented 2 years ago

I managed to get this working in the end. I'm not 100% sure which of the things I did helped but the code below, along with removing these lines results in a working modem.

I think one of the important things is that from what I see of debug output, my modem doesn't return OK to the +++ command-mode command. At the very least, dce.set_mode fails. Even with the (working) code below, I still see dce.set_mode failures for command mode.

void reconnect_modem(esp_modem::DCE &dce) {
  if (!dce.set_mode(esp_modem::modem_mode::COMMAND_MODE)) {
    ESP_LOGE(TAG, "Failed to set command mode");
    vTaskDelay(pdMS_TO_TICKS(500));
  }
  if (!dce.set_mode(esp_modem::modem_mode::DATA_MODE)) {
    ESP_LOGE(TAG, "Failed to set data mode");
    vTaskDelay(pdMS_TO_TICKS(500));
  }
}

void connect_modem(esp_modem::DCE &dce) {
  while (true) {
    const EventBits_t bits =
        xEventGroupWaitBits(event_group, (CONNECT_BIT | DISCONNECT_BIT), pdTRUE,
                            pdFALSE, pdMS_TO_TICKS(30000));
    if (bits & CONNECT_BIT) {
      ESP_LOGI(TAG, "Modem connected");
      break;
    }
    if (bits & DISCONNECT_BIT) {
      ESP_LOGW(TAG, "Modem disconnected, attempting to reconnect");
      reconnect_modem(dce);
      continue;
    }
    ESP_LOGI(TAG, "Waited 30s for IP address, attempting reconnect...");
    reconnect_modem(dce);
    continue;
  }
}

void run_modem(void *pvParameters) {
  setup_modem_initial();

  auto dce_bundle = BuildDce();
  auto &dce = *dce_bundle.dce;
  wait_for_modem_signal(dce);

  bool pin_ok;
  const esp_modem::command_result pin_result = dce.read_pin(pin_ok);
  if (pin_result != esp_modem::command_result::OK) {
    ESP_LOGE(TAG, "Could not read PIN status");
  } else if (pin_ok) {
    ESP_LOGI(TAG, "PIN is okay");
  } else {
    ESP_LOGI(TAG, "Need PIN");
  }

  ESP_ERROR_CHECK(esp_event_handler_register(IP_EVENT, ESP_EVENT_ANY_ID,
                                             &on_ip_event, NULL));
  ESP_ERROR_CHECK(esp_event_handler_register(NETIF_PPP_STATUS, ESP_EVENT_ANY_ID,
                                             &on_ppp_changed, NULL));

  while (!dce.set_mode(esp_modem::modem_mode::DATA_MODE)) {
    ESP_LOGE(TAG, "Failed to set data mode");
    vTaskDelay(pdMS_TO_TICKS(500));
  }

  ESP_LOGI(TAG, "Waiting for IP address...");
  connect_modem(dce);

  post_connected_setup();

  while (true) {
    const EventBits_t bits = xEventGroupWaitBits(event_group, DISCONNECT_BIT,
                                                 pdTRUE, pdTRUE, portMAX_DELAY);
    if (!(bits & DISCONNECT_BIT)) {
      continue;
    }
    ESP_LOGW(TAG, "Modem disconnected, attempting to reconnect");
    connect_modem(dce);
  }
}
david-cermak commented 2 years ago

@bobobo1618 Thanks for sharing the working code!

my modem doesn't return OK to the +++

This is weird, I was testing this with the same device, SIM7600 and it did work. This modem doesn't reply with OK, but NO CARRIER, which is still accepted, as success.

Anyway, I will test it again with that device to see if some regression been merged recently. But, clearly the implementation of switching modes needs to be improved.

david-cermak commented 2 years ago

Note, that the mode switching doesn't work reliably, as we exit PPP connection and (at the same time) sending +++ escape command. I've swapped the sequence in https://github.com/espressif/esp-protocols/pull/52 and this seems to help (at least for SIM7600, need to test with other devices).

bobobo1618 commented 2 years ago

To be clear, the result of #52 should be that this if statement no longer fails and just maybe takes a bit longer, right?

if (!dce.set_mode(esp_modem::modem_mode::COMMAND_MODE)) {
  ESP_LOGE(TAG, "Failed to set command mode");
  vTaskDelay(pdMS_TO_TICKS(500));
}
david-cermak commented 2 years ago

@bobobo1618 Sorry for the delay. Have updated the recovery sequence in https://github.com/espressif/esp-protocols/pull/70