Suggestions for the specification

bgusach commented 6 years ago

Hi there,

I think there are a few corners of the specification that could be improved or explained better. Maybe some of them have a documented explanation somewhere, but I could not find it. Well, there we go:

Naming should be consistent. Some parts refer to machine messages as Machine Message while others do as Machine Message Payload or Message Payload (this last one is especially confusing, since "message" is used everywhere). Same goes for Measurement and Process messages.
There are two classes called Measurement, one part of the MeasurementPayload and the other part of ProcessPayload. Naming is confusing, and the classes are very overlapping. Maybe they could be combined and reused?
Related to previous point, there are two Series classes, one belonging to ProcessPayload.Measurement and the other one to MeasurementPayload.Measurement. One requires offsets while the other one does not. Is this meant to be so? if so, could anybody give an example? I can't think of a meaningful series without a time offset...
How are the two possibilities of ProcessPayload.Measurement.Limits to be understood? How can a physic dimension have multiple limits at the same point in time?
The ProcessPayload class is too thin, and (at least in my limited understanding) conceptually pretty much the same thing as the ProcessPayload.Process. Maybe they could be conflated?

ameinhardt commented 6 years ago

Hi @bgusach , I agree with you. Furthermore, I think the idea of PPMP, to me a one-way communication stream and to be simple, should remain. I see PPMP as format structure in between transport protocol and further semantic. As discussed with @fpatz , we want to

clean up the syntax like you suggest
add reasonable lengths to all fields
add minor improvements like splitting "operationalStatus" into "deviceStatus" and "operationalMode"
add the possibility to link PPMP to further semantic description that seldomly or never change (e.g. the value of measurement point "temperature" indicates heat in Celsius and should be between 36° and 42°). See also #35 A context sections that has the same intention should not be included in every payload
add examples / recommendations / tutorials / howto use the format, including architecture blueprint or ways to shrink it (protobuf)

What do you think?

bgusach commented 6 years ago

@ameinhardt sounds pretty good.

How do you want to proceed? are those points already discussed and agreed to a certain level of detail? or they rather need to be further discussed?

ameinhardt commented 6 years ago

Since the unide version 0.2.0 was finalized, we could use the current 0.3.0-SNAPSHOT to create a new ppmp-schema/src/main/resources/org/eclipse/iot/unide/ppmp/v3 with improvements.

Cleanup and operationalStatus could be rather straightforward,
length of fields might be subject of discussion,
context (unit etc.) link is controversion and needs design (redundancy for simplicity vs. avoiding that via link or allow both). See also #35
architecture, blueprint, tutorials are to dos

@bgusach: where could you support?

bf-bryants commented 6 years ago

Hi,

I'd like to suggest a field length change to code - see $.messages[*].code in machine messages. It's currently limited to 36 characters.

We use the machine messages to send events, with structured codes in the form com/domain/app-name/event-name, so 36 ends up being really tight. A max length of 128 would alleviate the situation.

If this field is altered, the field $.part.code (in both measurement messages and process messages) should be altered to match.

ameinhardt commented 6 years ago

@bgusach: I didn't understand your last point. Did you mean "program" vs "process"? limits can vary during the run of a process (in the process of driving from Bremen to Berlin, there are different speed limits along the way). Regarding time in measurements, we wanted to give the option to express non-time relations like pressure over distance. Maybe that's only a view aspect, though. What do you think, @bf-bryants, is there always a time component in measurement data?

bf-bryants commented 6 years ago

Having a primary axis other than time is quite a conceptual shift from the current view of what measurement data is - as it is currently gathered in real time. We would be moving away from data gathering to a more general data transport format. The pressure/distance example sounds more like a rule set if there's no time component. Should we be looking at a new PPMP message type for such things?

How would we deal with data split over multiple messages? At the moment, the time component is mandatory. Everything can be ordered correctly. Moving to an alternate (and therefore non-mandatory) axis could be a source of problems.

Note that a system that measures both pressure and distance over time will use the current model; the receiving end can choose to ignore the time component. Do we have an explicit need to transport measurement data without a time component?

Our use cases for collecting sensor data all require timestamps as part of the data qualification, but I would be interested to hear about other use cases.

Steve

ameinhardt commented 6 years ago

I have created draft schemas for v3. You can find them at ppmp-schema.../v3

Note that

there is a file for common definitions
a context in measurement.series including limit and (data-) type
id/Id/ID corrections
renaming of redundant names ("part.partId" -> "part.id")
mode / state instead of operationalStatus
no complete length definitions yet

The website and diagrams are also updated, but only reachable via direct links:

bgusach commented 6 years ago

Hi @ameinhardt

@bgusach: I didn't understand your last point. Did you mean "program" vs "process"?

No, I meant ProcessPayload. I have the feeling that the info contained in the classes ProcessPayload and Process semantically belongs together and could be merged into one. (In general I think there are too many classes with similar names, and some of them could be merged, like these two).

limits can vary during the run of a process (in the process of driving from Bremen to Berlin, there are different speed limits along the way).

That is true, but still feels unclear to me. Let's use the Bremen-Berlin example. Say:

There are two speed limits (lim1, lim2) in the route
You produce a single Measurement for the whole trip
This Measurement has a Series with hundreds of velocity values: $_time=[0, 100, 200, ...], speed=[80, 100, 120, ...]

How do we know that at the time of the 507th element, the limit lim1 was active? I think we can't.

I would find it more elegant if each Measurement had 0..1 Limits, and if the car moves from lim1 to lim2, the measurement has to be stopped, and a new one created. This way you would be univocally defining a limit for a given time point.

Regarding time in measurements, we wanted to give the option to express non-time relations like pressure over distance. Maybe that's only a view aspect, though.

I agree with @bf-bryants , moving from time to another unit is a huge shift in the semantics. And the pressure-distance distribution still needs IMHO a time component in the context of machine performance management: that happened at a given moment and you most probably want to know when.

On top of that, these distributions must be discrete, right? for instance you have 100 pressure sensors over your 100 meter pipe, you could still have 100 dimensions in the Series for each offset. Something like: $_time=[0, ...], press_0=[80, ...], press_1=[40, ...], ... press_99=[120, ...] and thus you could somehow represent pressure over space.

Does it make sense? :)

muelsen commented 6 years ago

Hi @ameinhardt , in the proposal $_time is replaced by simply time. Is that right? For me $_time was a specific time value that indicates a timestamp by adding an offset to a start timestamp.

bgusach commented 6 years ago

Hi @muelsen

in the proposal $_time is replaced by simply time. Is that right?

Looks like $_time has been renamed to time but the meaning is the same: milliseconds on top of ts.

ameinhardt commented 6 years ago

Thanks for the input & active discussion, @bgusach, @bf-bryants, @muelsen!

In that first v3 draft/proposal, I included the feedback of #17: there's no specific need for '$'. Before, '$' should have indicated an array of 'long' instead of 'float'. Yet, now we have the schema which clearly indicates the data type. A parser can treat the keyword 'time' in the same way he would deal with '$_time'.

@bgusach: interesting point to merge classes with 1:1 relation! I always thought it's a good idea to keep the "shell" (*Payload) as universal as possible. An implementation could still merge all payload types of one device into one. Maybe "series" would be a candidate for merging into measurement instead?!

measurements[].context.temperature.limits[507] applies to measurements[].series.temperature[507] in the same way as measurements[].series.time[507] applies to that as well. If for all context* changes a new measurement has to be created, representation of reference curves (continous increase of temperature in warm-up phase) would be much more redundant.

You guys have a point: time would always be there with a measurement, although the later display of the correlations might not make use of it. @muelsen: do you agree?

bgusach commented 6 years ago

Hi @ameinhardt

A parser can treat the keyword 'time' in the same way he would deal with '$_time'.

I think that's good. And thus we can remove some exceptional name handling for the $_time field from the Python code :)

interesting point to merge classes with 1:1 relation! I always thought it's a good idea to keep the "shell" (*Payload) as universal as possible.

What do you mean by "universal" in this context? I find it hard to think about some reuse cases, if that is what you mean. There is this rule of thumb that says that if A needs B, and B needs A, then A and B are probably the same thing, and I have the feeling it is right here.

An implementation could still merge all payload types of one device into one.

Could you provide an example of merging payload types? do you mean reusing the same code classes or something like that?

Maybe "series" would be a candidate for merging into measurement instead?!

Uhmmm not sure about that one. The series entry has the advantage of grouping a set of dimensions, whose name is not known in advance (like pressure or temperature). If you move everything within series into measurement, it would be harder to find out what is a dimension, and what is an attribute of measurement (like ts), although still possible.

measurements[].context.temperature.limits[507] applies to measurements[].series.temperature[507] in the same way as measurements[].series.time[507] applies to that as well. If for all context* changes a new measurement has to be created, representation of reference curves (continous increase of temperature in warm-up phase) would be much more redundant.

Ah ok, I think I got it. If you don't use exactly zero or one limit, you need one limit object for each entry in the series right?

If that is the case, uhm... it depends on the case, but in general looks to me a tad inefficient. Coming back to the Bremen-Berlin trip, if we measure the speed 10.000 times, but the speed limit changes only 10 times, we are sending something like {"speed": {"upperError": 120}}, 10.000 times but 9.990 of them were actually irrelevant.

As you say, If those limits change a lot, like in every single datapoint (or just a few of them), it is true that creating a measurement object would be less efficient than your approach, but I'd think that is not the most common case... let's think about it, probably we can come up with a good solution :)

bf-bryants commented 6 years ago

Hi,

I'm missing the location for custom data. In V2, we had metaData fields, but I'm not seeing any in V3. We were using these in the machine messages to pass event specific data. Losing these kills V3 for us.

Does anybody know why they are gone?

Steve

ameinhardt commented 6 years ago

Hi @bf-bryants: I removed the dedicated metaData, but additionalProperties: false as well. It means you can add additional properties everywhere now as indicated in the examples. You could even still use the metaData tag, if you want. Parsers just use the known properties and ignore the others. Does that work for you?

bf-bryants commented 6 years ago

Ooops.. I missed that, sorry. I assume that will work. Thanks.

muelsen commented 6 years ago

Hi everyone, we discussed internally the new interface. One major point are the missing metaData. Currently it is very simple for use to find out wether a parameter is part of PPMP or not. If not we take the whole metaData object and store it in our database or use it for further processing. When there is no more metaData-object it is more complicated to find those information in the PPMP - we always have to make a diff of the spec and the actual message to find additional information. So our suggestion would be to keep the metaData-objects in PPMP and set additionalProperties to false.

Another suggestion is to change the name of the time-series to ts_offset in order to have a dedicated coupling with the ts of the measurements-object and to know directly that this means an absolute timestamp. "time" would be to not specific enough.

bgusach commented 6 years ago

I agree with @muelsen in both points. I'd just say that a key name like additionalData would be more descriptive than v2's metaData.

bgusach commented 6 years ago

Hi @ameinhardt, I was thinking about what we discussed on this:

@bgusach: interesting point to merge classes with 1:1 relation! I always thought it's a good idea to keep the "shell" (*Payload) as universal as possible. An implementation could still merge all payload types of one device into one. Maybe "series" would be a candidate for merging into measurement instead?!

Just an idea here. I realized that MeasurementPayload has 1..N Measurements and MachineMessagePayload has 1..N Messages as well. So maybe the ProcessPayload should have 1..N Processes too, and not the strange 1:1 relship (or my prior suggestion to merge them). Does it make sense to send several process data at once? I guess it does, as with Messages...

muelsen commented 6 years ago

Hi all, my final comment on the current proposal for v3:

time rename to time_offset
All additional properties shall be in an additional field (e.g. additionalData as @bgusach mentioned)
Remove shutoffPhase (can also be part of additionalData) The rest suits us.

ameinhardt commented 5 years ago

@muelsen: with the current version , that should be addressed

eclipse-archived / unide

Suggestions for the specification #33