Best practices for aggregation? (introducing Aggregation class)

jbkoh commented 2 years ago

We defined brick:aggregate and I'd like to also define best practices for the usage for its consistent implementation across users (including myself).

I think aggregation needs to have a separate class under the point tree other than instantiating actual sensors, statuses, etc., because they are aggregation "of" those points. In this example, the aggregation point is defined as an instance of brick:Power_Sensor but it will be confusing with an actual power sensor producing raw data to users (app queries).

My proposal is to introduce brick:Aggregation as a class and then relate the instance of aggregation to related points. An example is like this:

# schema
brick:Aggregation rdfs:subClassOf brick:Point.
brick:aggregateTarget a owl:ObjectProperty.

# instances
ex:power_sensor a brick:Power_Sensor.
ex:hourly_peak_real_power_sensor a brick:Aggregation;
    brick:aggregateTarget ex:power_sensor;  # if the target is already instantiated.
    brick:aggregateTarget brick:Power_Sensor # if the target is not instantiated.

There may be some discussions that I missed related to this one, and I'd be happy to discuss this further with any other proposals.

epaulson commented 2 years ago

I'm keeping an open mind, but I'm not sure I get it just yet. In the example from docs.brickschema.org that you link to, the power meter - which is the actual sensor - is doing the aggregation internally before it sends out any data, so we need some way to describe to anyone who is looking at the output stream how the aggregation is occurring. Why is that property not enough?

jbkoh commented 2 years ago

Maybe Aggregating_Sensor is a better term. My argument is that aggregated data even though they are originated from a sensor is different from just raw observations. I have a device that produces three data streams; the current current, the maximum current in the last minute, and the minimum current in the last minute. If I model them just as brick:Current_Sensor, an app query will be confused with three different values at a given time. Aggregated values represent different meanings than just raw observations.

epaulson commented 2 years ago

OK, that's an interesting twist. In the example from docs.brickschema.org, the "raw observations" were already aggregated before they leave the meter (like, you set physical DIP switches on the meter to say what if it's spitting out maximum draw in the past 1 minute or 5 minute or 15 minute window) - so the "aggregated data" wasn't any different from raw observations.

gtfierro commented 2 years ago

Couldn't you model your maximum/minimum current in last minute using the existing aggregation properties?:

:meter a brick:Electric_Meter ;
  brick:hasPoint :cur, :max_cur, :min_cur .

:cur a brick:Current_Sensor .

:max_cur a brick:Current_Sensor .
  brick:aggregate [
    brick:aggregationFunction "max" ;
    brick:aggregationWindow "RPT1M" ;
] .

:min_cur a brick:Current_Sensor .
  brick:aggregate [
    brick:aggregationFunction "min" ;
    brick:aggregationWindow "RPT1M" ;
] .

The only thing missing from this model is any explicit demarcation that min/max_cur are aggregates of the other current sensor, but that is a reasonable logical jump to make.

jbkoh commented 2 years ago

@gtfierro Yes we can model it in that way as in the documentation and that's Erik's first argument, but my point is their types should be different. Consider a typical query like this:

select ?current where {
  ?current a brick:Current_Sensor.
}

which is translated into "please give me current sensors", which usually represents "present current" values other than aggregated values. Three things will be returned with different values in the same timestamps, and users/apps would be confused. A user may use negate queries to filter out sensors with aggregation properties, but I'd argue that that's not a good type system design because any properties can be attached to the type at any time (in the RDF/OWL practice.)

So my argument is that the type of :max_cur should not be Current_Sensor because it is not just observation of current but aggregation of current values (whether it's aggregated at a device side or a database side, distinguishing those two would be a topic for another day...)

Maybe our type system can be more generic by saying Current_Sensor is anything related to Current, but I'd argue our type system should be more constrained in usage for interoperability/usability.

gtfierro commented 2 years ago

It's one extra line to get non-aggregated sensors:

select ?current where {
  ?current a brick:Current_Sensor.
  FILTER NOT EXISTS { ?current brick:aggregate ?agg }
}

We aren't really in an open-world setting because we are treating the Brick graph as authoritative, so I don't agree with your point "properties can be attached to the type at any time" being a problem. I can see how it could be confusing if you get multiple current sensors back for the query and you don't know to look for the aggregation property. However, I don't think removing the Current Sensor annotation from the aggregated sensors is the solution --- any sensor is sampling or aggregating the underlying physical phenomena.

Maybe we differentiate some of these with a parallel Virtual Sensor class, which can be automatically inferred to be anything that has a brick:aggregate (or other appropriate properties) --- then if you don't want the Virtual Sensors you can use a FILTER NOT EXISTS {?sen a brick:Virtual_Sensor} which is more generalizable than filtering for the non-existence of certain properties.

jbkoh commented 2 years ago

any sensor is sampling or aggregating the underlying physical phenomena.

This is an interesting argument indeed but I'm not sure if all the users share the same assumption that any sensor can be a sampled observation or an aggregated one. My bet is most users would assume it's a sampled one. An example is many data is collected from "presentValue" in case of BACnet.

If we allow any points to be an aggregated one or a sampled one, we have to properly document that if users assume the sensors to be the present values, they should modify their queries. Yes, the negation is a line of code, but now it should be everywhere per instance. For example,

SELECT * WHERE {
 ?sen rdf:type brick:Zone_Air_Temperature_Sensor .
  FILTER NOT EXISTS { ?sen brick:aggregate ?agg }
 ?sp rdf:type brick:Zone_Air_Temperature_Setpoint .
  FILTER NOT EXISTS { ?sp brick:aggregate ?agg }
 ?ahu rdf:type brick:AHU .
 ?ahu brick:feeds ?thing .
 ?thing brick:hasPoint ?sen .
 ?thing brick:hasPoint ?sp .
}

I don't think it's a good developer experience. If this is really what we want, we have to put it in the documentation everywhere before people getting too far from this practice.

The goal of Brick in my mind is always application-level interoperability and it's crucial to enforce common understanding across modelers and queriers.

jbkoh commented 2 years ago

@epaulson not sure if I was clear, but I do have a device producing those three data streams through its API.

epaulson commented 2 years ago

Yes, I understood that your scenario was different and you had a device producing simultaneous data streams.

jbkoh commented 2 years ago

Yes, I understood that your scenario was different and you had a device producing simultaneous data streams.

Though the problem gets more obvious in my scenario, I think the fundamental problem remains the same: would (or should) common users assume any Point can be aggregating or sampling?

gtfierro commented 2 years ago

Yes, the negation is a line of code, but now it should be everywhere per instance.

We already require these kinds of extra annotations to retrieve units, timeseries database pointers, etc. How else is a user supposed to retrieve this metadata?

Under your proposed Aggregation sensor, I still need to query for the properties saying how that data is being aggregated. I also have the additional complexity that the units/etc must be fetched from the sensor I am aggregating.

SELECT * WHERE {
  ?agg a brick:Aggregation ;
           brick:aggregateTarget ex:power_sensor ;
           brick:aggregateTarget/brick:hasUnit ?unit ;
           brick:aggregate [
               brick:aggregationFunction "min" ;
                brick:aggregationWindow "RPT1M" ;
           ] ;
}

I'm very cautious of the design where aggregateTarget can have either an instance or a class as the object...feels like that would cause more confusion. I like the idea of adding an aggregationTarget property to the object of brick:aggregate, though I am hesitant about mandating that all aggregations be removed to their own isolated subtree of brick:Point. We should be cautious about overfitting to the particular scenario you've described

gtfierro commented 2 years ago

Though the problem gets more obvious in my scenario, I think the fundamental problem remains the same: would (or should) common users assume any Point can be aggregating or sampling?

A Point is a digital I/O point. If it's not aggregating or sampling, what is it doing? There are innumerable cases where the discrete values a sensor/etc consumes are fundamentally an aggregated property.

jbkoh commented 2 years ago

A Point is a digital I/O point. If it's not aggregating or sampling, what is it doing? There are innumerable cases where the discrete values a sensor/etc consumes are fundamentally an aggregated property.

My hypothesis is that users are more towards the concept of sampling than aggregation when they see Points. It's ungrounded, so if both Gabe and Erik (ideally more people) think that normal users would assume either way is possible, I'd say we just need to document it better.

We should be cautious about overfitting to the particular scenario you've described

As I said multiple times, it's not particular to my scenario but I'm generally concerned about how normal users would perceive brick:Point. In the current scheme, we have to educate all the users that if they want the current temperature of rooms in the buildings, they always have to put the negation line. Otherwise, the query will work for some buildings as they intended but not for other buildings with aggregating sensors. It's not documented anywhere in our guideline/documents/examples.

jbkoh commented 2 years ago

Or we can be more explicit about summarization methods whether those sensors are sampling or aggregating. Currently, "sampling" is implicit while aggregating is explicit. Something like

:curr brick:summarizationMethod brick:Sampling
:max_curr brick:summarizationMethod brick:Aggregation
brick:Sampling a brick:Tag # brick:Tag is a placeholder.
brick:Aggregation a brick:Tag.

epaulson commented 2 years ago

I think the most common case is whoever is adding a power meter to the Brick is not going to know exactly what the meter is doing under the covers and when they add it, they're just going to treat each read of the sensor as the instantaneous value at that sample.

But, I also think that people will expect to be able to say "my meter is capturing max/min/ave/etc values over some time window", and worse, I think that window might be fixed or sliding, so we need to be able to say that too.

I think the least surprising definition is "assume it's a sample of the instantaneous value" but I agree that we need to be very, very clear in our documentation about these definitions.

epaulson commented 2 years ago

I think Schneider's manual is a good list of different ways meters actually aggregate data in the field (moving averages, peaks over intervals, etc)

https://www.productinfo.schneider-electric.com/pm8000/594169a446e0fb0001570843/PM8000%20user%20manual/English/BM_PM8000_UM_0000264423.ditamap/$/Chapter_Measurements_0000055263

jbkoh commented 2 years ago

I do agree that we need to be able to express aggregation details. My point at the very high level is that we should not tolerate possibilities for frictions in understanding and using Brick. I'm not tied to any specific implementation of it. It can be more specific type hierarchy (aggregating points vs sampling points), more properties (is it sampling or aggregating?), new tools (raises warning if the query doesn't specify sampling vs aggregating), better documentation (this should be done anyway actually), etc.

Especially for meters/electricity, users may be alert to aggregation as it's common in the domain, but our definition of aggregation is generic to any Point. Thus, users for any concepts should be aware of the fact that a Point may or may not be a sampled point but an aggregating point.

Erik, you said "assume it's a sample of the instantaneous value" but does it imply that "it may not be a sample of the instantaneous value"? Then, the definition should be more precisely "it's a sample of the instantaneous value unless it's specified to be aggregating", which means that any queries should be clear on whether it's querying for an aggregation point or a sample one by the point's aggregating property. Even a simple query asking for the current temperature of a room should consider the possibilities of Temperature Sensors can be aggregating in the past days or not.

I personally do not think that's an ideal developer experience, and I'm open to any implementations to address the problem (several options are at the top.) Maybe it's only me picky to developer experiences. If the majority of the community think it's not an issue, let's do better documentation at the least.

BrickSchema / Brick

Best practices for aggregation? (introducing Aggregation class) #368