Pyrrha-Platform / Pyrrha-Z

Next-generation firmware built on the Zephyr RTOS.
Apache License 2.0
2 stars 0 forks source link

Implement data collection, encoding, and recording process #8

Closed bpbradley closed 2 years ago

bpbradley commented 2 years ago

To do before merge:

bpbradley commented 2 years ago

I added a json encoder for the data. Write now it uses about 150 bytes per message, up to <256 maximum, roughly.

With 128Mb memory, this will leave us with enough space for about 60k messages, roughly, minus some overhead that I'm not calculating for the filesystem. This should obviously be much more than enough considering we should only be holding onto unsent messages.

Something to note is that I am trying not to do floating point math, and so I am storing everything in fixed point representation. That is reflected in the json data, as the sensors are broken up into the integer and fractional portions, rather than combining them into a floating point representation.

For instance, here is a sample json message

{"timestamp":0,"err":2,"gas":{"co":{"i":5,"f":815689},"no2":{"i":2,"f":280136}},"rht":{"temp":{"i":0,"f":0},"humidity":{"i":0,"f":0}}}

In this case, the CO measurement would be 5.815689, for instance. (also note this is a testing setup, I don't have 5 ppm CO in my office :eyes:)

Also this is adding an err value. In this case, it is reporting an error on the rht (I don't have it connected), so the host would need to parse this and realize that data is to be thrown out.

Next is to queue this data for sending / storing.

bpbradley commented 2 years ago

I have added a storage handler that implements a non-volatile FIFO buffer for incoming and outgoing data.

This is a nontrivial problem, and I think this implementation should be re-evaluated as it carries a number of drawbacks.

The implementation (on the application layer) is as follows

  1. When new sensor data is available, it first checks for new records
    • If new records are found, it will walk through all records and attempt to send all records one at a time.
    • If there are any errors sending any records, this will be marked as an error and all records will need to be resent later.
    • Only once all records are sent sucessfully will the FCB be rotated, deleting the old records.
if (stored_record_is_available()){
      ret = record_walk(send_record_handler);
      if (ret >= 0){
          ret = rotate_record_buffer();
      }
  }

In this case, record_walk will loop through all records, and apply the send_record_handler to each of them, which would look like this


int send_record_handler(struct pyrrha_data * record){
    char buffer[CONFIG_PYRRHA_MESSAGE_BUFF_SIZE] = {0};
    int ret = sensor_data_encode(record, buffer, sizeof(buffer));
    if (ret == 0){
        ret = notify_client(buffer, strlen(buffer));
    }
    return ret;
}
  1. New data is attempted to be sent
  2. If new data can't be sent, it will be appended to the flash circular buffer.
    
    /* Attempt to send the latest data */
    ret = send_record_handler(&msg);

if (ret != 0){ / We have encountered an error trying to send data. This message should be stored for later / ret = store_new_record(&msg); if (ret == -ENOMEM){ / Not enough memory to store this record. Rotate the buffer and try again / rotate_record_buffer(); store_new_record(&msg); } }


So while this is functional, and is space efficient (I am storing sensor data as a binary until it needs to be sent to a host, allowing about 30% of the storage requirements as encoded json), it is not very forgiving to errors from a host which is not reliably accepting notifications. This may be a non-issue in practice, but it will need to be reviewed.

Further, it does cause some conflict on the host end parsing the data. Since it may be unaware that a notification of old data failed, it wouldn't realize that the next time it connects it will be receiving duplicate records that weren't rotated out. This means we may need to add some additional field with a unique id for each sample, or we need to be extremely confident in the timestamp (which I don't think we can be without an onboard RTC).

Unfortunately, there is no good way to delete individual entries after each is sent without significant wear on the flash. The only way to erase an arbitrary (n) number of entries is to copy the entire sector to a new sector, with the first n entries removed. This will result in a large strain on the flash memory and force it to endure far too many erase / program cycles.

It is possible there are some better ways to handle this, but they will be a bit hacky. For now we can go with this, but we should review the requirements for host / client communication to try and find a better way to manage this.

One alternative solution would be to drop the requirement that all new data is sent through the same notification, and instead allow the host to directly access logs through a different characteristic / profile. This way, the data can always just append to a file, and the host can simply download that file when convenient, and then delete it. We still would need to rotate files as the get too large, but we can simply follow a format like `/log.0001` `/log.0001` etc until the host consumes all files.
bpbradley commented 2 years ago

This is ready for merge, however we will likely want to revisit this soon to improve the issues I brought up earlier.

krook commented 2 years ago

Excellent, thank you @bpbradley for the detailed thought process and documentation.

krook commented 2 years ago

Is this ready to flip the repo public with this merge?

bpbradley commented 2 years ago

Is this ready to flip the repo public with this merge?

Yes, it is ready.