avaldebe / PyPMS

Data acquisition and logging for Air Quality Sensors with UART interface
https://avaldebe.github.io/PyPMS/
MIT License
29 stars 8 forks source link

List all possible reader attributes in the README #13

Closed stantond closed 4 years ago

stantond commented 4 years ago

It'd be helpful to have a list of all possible attributes a reading could have, so that I can ensure I have a database column for each one and can fully support all sensors in this package, particularly as the scope now includes a variety of sensor types. I couldn't find this in a single file anywhere, so might fit best in the readme.

avaldebe commented 4 years ago

Yes, like #14 this is something to put on the docs.

To answer your question, obs are dataclasses instances which you can inspect with dataclasses.fields(obs). This will return a tuple of dataclasses.Field objects, one for each data field in obs. The dataclasses.Field objects contain the name and type information, among other info such as metadata.

I think what you need is something like

from dataclasses import fields
from typing import Dict
from pms.sensor import base

def types(obs: base.ObsData) -> Dict[str, str]:
    """return a dictionary containing the type of each data field"""""
    return {field.name: field.type for field in fields(obs)}

If this is the case, I'll add it as a method on the observations base class (base.ObsData).

The influxdb commands mqtt use the output from dataclasses.fields(obs) to only publish data fields with metadata:

https://github.com/avaldebe/PyPMS/blob/0f2a4fc591ac6f1b98fcdec387e84ae628917479/src/pms/service/influxdb.py#L73-L76

https://github.com/avaldebe/PyPMS/blob/0f2a4fc591ac6f1b98fcdec387e84ae628917479/src/pms/service/mqtt.py#L139-L149

The python docs on this subject are a bit dry, so maybe you have have a look at this tutorial an example usage.

avaldebe commented 4 years ago

Other option would be to add an info command, so pms -m MCU680 info would return some basic info about the sensor, a list of data fields and supported formats.

stantond commented 4 years ago

The problems I'd like solved are:

  1. As a developer using this package, I want to know all possible attributes of obs upfront, so that I can make sure my database that stores all observations can accommodate all possible observations.
  2. As a developer using this package, I want to use obs consistently across all possible sensors, so that I don't need to be concerned with mapping each individual sensor to specific database columns (a.k.a. if an attribute doesn't exist for a given obs, I can easily turn this into a NULL in my database rather than an error I need to handle).

I think story 1 can be solved easily with documentation in the readme, and 2 could be solved by the option to get all obs attributes as a dictionary, so obsDict.get(attribute) would safely return None.

The solutions/snippets above look like they'd be helpful in general, but as they deal with individual sensor models, they won't solve these problems.

avaldebe commented 4 years ago
  1. As a developer using this package, I want to know all possible attributes of obs upfront, so that I can make sure my database that stores all observations can accommodate all possible observations.

Do you want this for all supported sensors as one long table? like here Or just for one sensor at the time? like here

... and 2 could be solved by the option to get all obs attributes as a dictionary, so obsDict.get(attribute) would safely return None.

this is already implemented as part of the standard library in two different ways

The solutions/snippets above look like they'd be helpful in general, but as they deal with individual sensor models, they won't solve these problems.

I'm very thankful of your insights. For me this is a cli tool, so It is very hard for me to imagine the usage without a clear application in mind. Documentation aside, using it as a library will demand extra work from the downstream developer, compared to a "true" library.

I have been looking at your project, and looks like the OctoPrint-plugin framework requires you to create separate objects for the sensors and database. Is your plugin limited to PMSx003 sensors? Or you plan to support all PyPMS supported sensor? If so, will it be possible to mix sensor types?

If you only plan to support PMSx003 sensors, then your readings table is fine, If you plan to support mix sensor types, I can see why you would want to know all the possible data fields.

In this case, maybe you would find useful an observations class method that returns a dictionary with all possible data fields:

# pms/sensors/base.py

@dataclass  # type: ignore
class ObsData(metaclass=ABCMeta):
    ....

    def superset(self) -> Dict[str, Union[int, float, None]]:
         d = dict.fromkeys(list_of_all_possible_fields)
         d.update(asdict(self))
         return d

Given that sqlite3 is part of the standard library I'm tempted to implement a command following your db to your schema, so I can better understand your use case. But I think that I would store the raw messages as blobs (blob = reader(raw=True)) and decode the stored message on read (Sensor[sensor_name].decode(blob, time=timestamp)). That way I do not need to deal with the heterogeneity of data fields at the database level.

stantond commented 4 years ago

Do you want this for all supported sensors as one long table? like here

One long table in the docs would work best. I think the table you already have could just be extended to have a column per attribute, with the internal attribute name shown similar to the way you show the friendly sensor model and the code sensor model in the first two columns, as long as you don't think this would make the table too big.

Or just for one sensor at the time? like here

Can I suggest having links in the readme to any sub-pages of documentation to make it easy to find more detail? I hadn't spotted that page, good to know that information is available.

this is already implemented as part of the standard library in two different ways

Perfect, it'll probably pass asdict(obs) so I can consistently use .get

compared to a "true" library.

I haven't found anything else that does what PyPMS does - one Python package to abstract away the complexities of supporting many serial sensors.

I have been looking at your project, and looks like the OctoPrint-plugin framework requires you to create separate objects for the sensors and database.

If a plugin needs a database, it needs to handle that. It there are any devices (other than the printer), then it needs to handle those too. That's a good thing really, I want to be as independent from OctoPrint as possible because it reduces the risk that a plugin accidentally affects the core functionality of actually running the 3D printer. It also makes it easier to convert this project to something standalone in future if that makes sense for it.

Is your plugin limited to PMSx003 sensors? Or you plan to support all PyPMS supported sensor? If so, will it be possible to mix sensor types?

I'd like to support all PyPMS sensors. As these sensors are cheap and usually not factory calibrated, I'd also like to average readings by room, so you could potentially combine different types of sensors to get readings that are more likely to be accurate.

In this case, maybe you would find useful an observations class method that returns a dictionary with all possible data fields:

# pms/sensors/base.py

@dataclass  # type: ignore
class ObsData(metaclass=ABCMeta):
    ....

    def superset(self) -> Dict[str, Union[int, float, None]]:
         d = dict.fromkeys(list_of_all_possible_fields)
         d.update(asdict(self))
         return d

That would definitely be helpful, especially when working out if anything needs to change between updates, but I'd need to be careful using it to dynamically build the database itself to avoid issues in updates, so I'd need to manually check your documentation table too, and implement something smart using this dictionary in the future.

Given that sqlite3 is part of the standard library I'm tempted to implement a command following your db to your schema, so I can better understand your use case. But I think that I would store the raw messages as blobs (blob = reader(raw=True)) and decode the stored message on read (Sensor[sensor_name].decode(blob, time=timestamp)). That way I do not need to deal with the heterogeneity of data fields at the database level.

To be honest, I'm not sure what this bit means!

avaldebe commented 4 years ago

Given that sqlite3 is part of the standard library I'm tempted to implement a command following your db to your schema, so I can better understand your use case.

I'm considering to add a new command (pms sqlite3) which will create/update/query a db with a similar schema like the one in your plugin. This extra functionality would require no additional dependencies, as sqlite3 is part of the standard library.

The usage would be something like

# associate port/sensor to location
pms -s /dev/ttyUSB0 -m PMSx003 sqlite3 pypms.db --location "Enclosure (Internal)"
pms -s /dev/ttyUSB1 -m PMSx003 sqlite3 pypms.db --location "Enclosure (External)"
pms -s /dev/ttyUSB2 -m PMSx003 sqlite3 pypms.db --location "Office"
pms -s /dev/ttyUSB3 -m MCU680  sqlite3 pypms.db --location "Office"

# sample every 15 sec on the background, 20 samples for each sensor
pms -s /dev/ttyUSB0 -m PMSx003 -i 15 -n 20 sqlite3 pypms.db &
pms -s /dev/ttyUSB1 -m PMSx003 -i 15 -n 20 sqlite3 pypms.db &
pms -s /dev/ttyUSB2 -m PMSx003 -i 15 -n 20 sqlite3 pypms.db &
pms -s /dev/ttyUSB3 -m MCU680  -i 15 -n 20 sqlite3 pypms.db &

# wait until sampling is done, about 5 min
wait

# decode all samples for PMSx003@/dev/ttyUSB2 and MCU680@/dev/ttyUSB3
pms sqlite3 pypms.db --decode "Office" --format csv
Office, PMSx003, /dev/ttyUSB2, 1601219850, 0, 6, 6, 0.0, 6.0, 6.0, 1.59, 0.53, 0.26, 0.18, 0.00, 0.00
...
Office, MCU680, /dev/ttyUSB3, 1601220150, 24.2, 27.6, 1012.53, 1, 25, 42.1, 5

But I think that I would store the raw messages as blobs (blob = reader(raw=True)) and decode the stored message on read (Sensor[sensor_name].decode(blob, time=timestamp)). That way I do not need to deal with the heterogeneity of data fields at the database level.

Since 0.4.0 it is possible to capture raw messages and decode them at a later stage. For example

"""capture raw messages to a csv file and decode them afterwards"""

from csv import DictReader
from pathlib import Path
from pms.sensor import Sensor, SensorReader

path = Path("raw_messages.csv")

# write 10 MCU680 messages to the csv file
with SensorReader("MCU680", samples=10) as reader, path.open("a") as csv:
   if path.stat().st_size == 0:
      csv.write("time,sensor,hex\n")
   for raw in reader(raw=True):
      csv.write(f"{raw.time},MCU680,{raw.hex}\n")

# read the MCU680 raw messages from the csv file
with path.open() as csv:
   reader = DictReader(csv)
   for row in reader():
       sensor = Sensor[row["sensor"]]
       message = bytes.fromhex(row["hex"])
       time = int(row["time"])
       obs = sensor.decode(message, time=time)
       print(f"{obs:csv}")

For the pms sqlite3 implementation, instead of creating a table with all the data fields from all supported sensors, I would take advantage of this new feature and create a simple table that records the raw messages as binary blobs, and would roll your locations and devices into one table, so location is just shorthand for sensor@port

CREATE TABLE IF NOT EXISTS sensor_location(
    id          INTEGER PRIMARY KEY,
    name        TEXT,
    model       TEXT,
    port        TEXT DEFAULT NULL,
    created     DATETIME,
    modified    DATETIME DEFAULT CURRENT_TIMESTAMP
)

CREATE TABLE IF NOT EXISTS readings(
    id          INTEGER PRIMARY KEY,
    sensor_id,
    timestamp   DATETIME DEFAULT CURRENT_TIMESTAMP,
    message     BLOB,
    FOREIGN KEY(sensor_id) REFERENCES sensor_location(id),
)

Then, when I get a --decode command I decode each one of the messages with the corresponding Sensor[model]. This way the db does not have to care about the data fields, and there is no need to add columns when a new sensor adds new data fields.

stantond commented 4 years ago

I need to keep my database as generic/library-agnostic as possible, so that it can be extended with other sensor libraries in future. Will likely add Tasmota post-MVP to support wireless sensors, for example.

Location will make more sense when I'm further along building the plugin.