Separating different data types

hyperbart commented 7 years ago

More of a think excercise: how should we distinct data in InfluxDB/Grafana? Temp vs lux vs energy vs humidity etc.

Distinction by group address in query in Grafana?
Storing the DPT in the measurements table and thus storing redundant data?
Creating "DPT" like table and linking each measurement to DPT type in separate table? Populating this table by matching the GA and the DPT type described in the config file and iterating over the DPT table then?

snowdd1 commented 7 years ago

Hello @hyperbart , Nice to "see" you again! Actually I started a similar project some days ago and now I found yours. I was planning to use a mariaDB database simply because I have one on my NAS, and found that grafana not yet supports time series data from MySQL/mariaDB, started writing my own Ajax for Chart.js...

But here is my comment: I would store the DPT in the measurements table, it's not redundant, it's a key. Otherwise you cannot convert % cleanly from the byte value sent over the bus, etc.

hyperbart commented 7 years ago

Aha, likewise! Just to set the record straight: all the work and effort goes to Firedrunk, I am just the client 😄.

Back on topic: I have had the same discussion with Firedrunk and one of the arguments of him was that the data stored in InfluxDB has already been processed and converted by a component before InfluxDB. Don't know the details, but I tend to take your side too on this issue, but if the data is already converted it seems a little but redundant. I know a little about KNX, but the whole technical story about DPT's and conversions not (yet).

snowdd1 commented 7 years ago

Thing is, you simply can't distinguish between types that have the same digital signature in the DPT, like all 5.xxx types are 1 byte of payload. The interpretation "that's a percentage" and "that's a small integer" or "that's an arc degree" is up to the client. Can you provide the GA list to KNX.js to do the interpretation? I have never used that package as I run a knxd for hardware access.

hyperbart commented 7 years ago

That's the thing: the interpretation has already been done before the value gets stored in the InfluxDB.

In the environments JSON file you can specify which GA is which DPT.

Example:

https://github.com/FireDrunk/node-kninflux/blob/develop/example.environment.json

snowdd1 commented 7 years ago

Ah that's cool. And da*n, I was just working on that, too. So I don't need that any more, I guess. Maybe I start a fork to make that package work with knxd too, I don't want to have two multicast services on one device. Or I would need an influxdb on my Synology box. mmmmh.
Btw what happens to dpt16 and other character telegrams?

snowdd1 commented 7 years ago

Just looked at the evironment file. The only thing I didn't find is an automatic converter ETS5-GA-Export to evironment file, right?

I think that part I did yesterday for my own approach, with little changes it should be able to produce a file in that format, too.

hyperbart commented 7 years ago

Haha, no; that's something you would have to do manually. But by all means go ahead 😄.

snowdd1 commented 7 years ago

I hate "manually". Always thought of getting it more automized for homebridge-knx as well, but apart from reading the GAs the exports were no good (especially if the schema is messy). With ETS5 you kann get a clean list of GAs with DPTs as XML, which didn't work well with ETS4 for me (might have been my fault, after all).
I load it up to GitHub next days for inspection!

snowdd1 commented 7 years ago

I am still wondering how it tries to cast from DPST-9-1 to DPT9.001, because the .match(regexp) results are quite different. And

var m = 'DPST-9-1'.toUpperCase().match(/(\d+)(\.(\d+))?/)

has no m[3] as required some lines later for the subtype ID, as it doesn't contain the literal dot \. but a dash, and two of those. That should efficiently prevent all subtypes different from the default to be recognized if in ETS format.

==> @FireDrunk ?!

Are you sending READ telegrams at the interval for ALL addresses in the environment file? Wouldn't it be better to specify which to read in the file (or somewhere else)? Frankly I don't dare to test, I don't have a test house available...

hyperbart commented 7 years ago

Always thought of getting it more automized for homebridge-knx as well, but apart from reading the GAs the exports were no good (especially if the schema is messy).

I think the guys from Thinka facilitate that by allowing an upload of the ETS Proj File and then filtering out the stuff or renaming it in a Web GUI and further specify the kind of device which is behind a certain GA in the WebGUI.

Are you sending READ telegrams at the interval for ALL addresses in the environment file? Wouldn't it be better to specify which to read in the file (or somewhere else)? Frankly I don't dare to test, I don't have a test house available...

We do have some problems regarding that, because since the hole KNX traffic thing is async, it's not like TCP where you know that fragment 3 is a response to fragment 2. So, yeah, basically we are now flooding the bus in one big spike by sending out requests for all the addresses listed in the JSON file. The timeout kninflux waits might be too short so that we miss some responses (I have seen this in ETS, but if wanted I can test some things here and I am pretty sure some guys at Dutch tech-site Tweakers.net would like to test this too).

snowdd1 commented 7 years ago

I wouldn't recommend waiting for a specific response. If you need to, send out the request, and teh answer will just happen. And try not to flood the bus with to many telegrams:

the bus speed (TP) is only 9600 baud
given that "normal traffic" is also present on the bus, flooding it will definitely suppress telegrams of the normal traffic
flooding it will also prevent the answers to come through! My bus has ~900 addresses, if I would flood it (would probably take around 15 secs to send at about 60 telegrams per second) I will probably not get any answers in that first 15 seconds, and probably loose READ telegrams as well as their repetition counter has expired, resulting in more loss of (expected) answers.

I recommend

Defining the required read requests and their timing in a config somewhere
Sending the read requests without actively waiting for an answer to that specific read request. That's simply not the way KNX is designed.
Listen to all addresses in the evironment regardless of READ requirements. Good KNX sensors actively send the values if something has changed (configure as active object in ETS)

Just things I have learned in tweaking my own house and trying to keep the WAF above ground zero 😄 for 6 years now.

BTW: I have pushed the stub at https://github.com/snowdd1/knx-ets5-ga-reader

hyperbart commented 7 years ago

@snowdd1 : Yeah I know this isn't best practice, but @FireDrunk had his reasons which I am not able to explain here and I came up with the same arguments.

Offtopic: damn what? 600 addresses? 😮 EDIT: just checked mine, around 300, puts the 600 in perspective since there a lot of "To do's" here which will increase it probably too ;) .

I think your third bullet point is the most important one: just listen on the bus when a value passes by and put it in the DB. But we arrived at the point were @FireDrunk should jump in to give his view on things.

Aha, the magical all deciding WAF 😉

snowdd1 commented 7 years ago

Offtopic: damn what? 600 addresses? 😮

Well, there goes lightning, roller shutters, floor heating, window contacts, glass break alarm, gas boiler, solar thermic buffer, fire place w/ boiler, garage doors, motion detectors, and: I love to know the details, so my thermostats send a lot of statistics (target values, current values, valve percentages etc.), as does the boilers etc.
I even built a small one-wire-to-knx raspberry to get more data from the water pipes from the heating ecosystem, and I am thinking to do something similar for garden (earth/soil) temperature and humidity.

FireDrunk commented 7 years ago

Hi @snowdd1, nice to see someone interested!

But here is my comment: I would store the DPT in the measurements table, it's not redundant, it's a key. Otherwise you cannot convert % cleanly from the byte value sent over the bus, etc.

Not entirly true, the value is stored in the interpreted value, not the byte value, so you don't need the DPT type anymore for value interpretation, you only need it for visualization (like knowing a number represents a temperature or a percentage in case of humidity).

Thing is, you simply can't distinguish between types that have the same digital signature in the DPT, like all 5.xxx types are 1 byte of payload. The interpretation "that's a percentage" and "that's a small integer" or "that's an arc degree" is up to the client. Can you provide the GA list to KNX.js to do the interpretation? I have never used that package as I run a knxd for hardware access.

From my understanding (which is little), you should be able to know what type of integer comes from a device, given the DPT type. So in my honost opinion it's NOT up to the client...

I am still wondering how it tries to cast from DPST-9-1 to DPT9.001, because the .match(regexp) results are quite different. And

I honestly don't know, i'm not familiar with the internals of the KNX DPTLib.

Flooding it will also prevent the answers to come through! My bus has ~900 addresses, if I would flood it (would probably take around 15 secs to send at about 60 telegrams per second) I will probably not get any answers in that first 15 seconds, and probably loose READ telegrams as well as their repetition counter has expired, resulting in more loss of (expected) answers. You are quite correct, but there's a catch. NodeJS and the KNX module are fully asynchronous, so reading responses from the first reads will happen parallel to sending new requests. So as long as the bus can keep up, it's not an issue.

To further explore the subject, i've been thinking about a 'load balancing' algorithm to spread out the readings in the specified interval. The basic interval of 30 seconds is almost pointless, but it's good for testing. I think something like 5-10 minutes makes more sense. But the current flow is still, that the application will create all the read requests at the same time. It might be good to spread out the creation of those requests in the total interval time. But it's a tough nut to crack programming wise.

snowdd1 commented 7 years ago

That's the thing: the interpretation has already been done before the value gets stored in the InfluxDB. In the environments JSON file you can specify which GA is which DPT. Example: https://github.com/FireDrunk/node-kninflux/blob/develop/example.environment.json

So I've done it. You can now create a json file from an ETS5 XML export. The format is not completely like the one you'd need here, but the code should easily be adoptable to KNX.js environment format.

https://github.com/snowdd1/knx-ets5-ga-reader

I've also worked on a generic DPT parser that can read ANY DPT around (uses templates from the ETS). The results are a bit different from a pure numeric parser, as it also transports semantic meaning. Which, of course, you do not need to put in a database. If you have the original telegram (the bytes) and know the DPT you can get any information later at runtime.

And now it is published: https://www.npmjs.com/package/knx-dpt-parser

snowdd1 commented 7 years ago

Back to the original issue: It would be sufficient to store the group addresses in the database. I do not know enough about InfluxDB yet, but it should be feasible in any database system to create a view or query that combines the data table with the metadata table.

FireDrunk commented 7 years ago

Theoretically yes, but I don't know if it's any good if we make Influx 'add' the type dynamically to all results according to relation logic. That's inefficient. Also it's more like storing data aiding in the visual representation of data inside the data, which is (a common) bad practice.

It's usually better to let the application that visualizes the data, to represent the data in a manner which properly understood and easily readable by the end user.

Perhaps we can introduce some form of 'grouping' sensors, so that we can query all the data more easily by selecting an entire group. If the group contains only sensors with the same DPT Types, it's easier to visualize the data all at once.

If you guys [@snowdd1] and [@HyperBart] still think this is a good idea, can you guys please come up with a real world example (detailed) of how you would like to visualize the data.

hyperbart commented 7 years ago

@FireDrunk to me I could "live" with having just write a query that contains all the GA's I want visualized. I am now completely with you about the concept of storing data to aid in visual rep and the bad practice.

Furthermore: since each and every KNX project can and will be very different there's no other way than just picking each and every individual GA and put it in the right graph/visu by adding it to the WHERE clause of a query I think...

@snowdd1 : what do you think?

@FireDrunk : however that does not mean I wouldn't like to have just copy of the raw data, basically the ingress table and another table with the processed data, just for troubleshooting purposes sometime and this would be a good "logbook" of having EVERYTHING that passes on the bus...

snowdd1 commented 7 years ago

I hadn't had the time to look into grafana yet. How do I pull meta information to accompany the value data from the query? I am with you that storing meta data with each record is a waste of storage.

hyperbart commented 7 years ago

You don't... You open your ETS file and go "aha right GA 1/3/1 is temperature bedroom, let me add that to my query". Next! "Aha, right GA 2/4/3 is temperature living room, let me add that to the query".

I have a select * right now since the only GA's I am storing in Influx are my temp values...

snowdd1 commented 7 years ago

Buh, I thought one could just pump all values in there.... probably firedrunk is right then, a separated table or measure is required per type. Which, if what I've read about ~~grafana~~ influxdb now shouldn't be a problem nor a waste of storage, as grafana is a schemaless database which does not use fixed column datastore.

Edit: wrote "grafana" meant "influxdb"

FireDrunk commented 7 years ago

Grafana is not a database, it's just visualization. It can use multiple datasources and has some nice logic to incorporate multiple queries into a single graph.

To be precise, my 'test' setup (which is basically @hyperbart's home :) ), has an installation of Grafana linked to InfluxDB. Influx takes care of all the storage, while grafana queries Influx realtime.

I haven't played around with a 'type' table, but theoretically you could create a query in Grafana which queries all the values given a specific type. This means you can create a "Temparture" graph collection and select all values (with the associated addresses) given a specific DPT value.

FireDrunk / node-kninflux

Separating different data types #4