COVESA / commercial-vehicles

Apache License 2.0
2 stars 0 forks source link

When to send data to the cloud #14

Closed eriksven closed 4 months ago

eriksven commented 6 months ago

For the guidelines, I am wondering when which data is actually send from the vehicle to a backend and how much delay we allow between the measurement and the transfer of the sample to a backend. What do you think?

For the curve algorithm, which is used for many data points like the GPS location, I see different options: "slice" the data either with or without overlap and curve or curve after each new sample.

For reference, let us assume the following samples: rawSamples

When we apply the curve algorithm over all data we could get the following result. curveApplied

However, we are dealing with a stream of data and do not have an infinite buffer size to fit all potential samples and also need to make a decision when to send the data to the backend (before infinity). One approach would be to slice the data set and then apply curve on each slice. We can then send the result per slice directly to the cloud. This would look like this: curveSlice

In this approach, we treat each buffer independently and will always send at least the first and last sample of each slice. A specialization of this would be to let the slices overlap each other so that the last point of one slice is the first point of the next slice. This would like this: curveOverlappingSlices

Another approach is to apply curve whenever a new sample gets added to the buffer. Here we could decide for an each sample whether we keep it and therefore send it to the cloud after curve has been applied for the subsequent sample. onlineCurve

To avoid that the buffer size becomes too large, we could work with a sliding window and drop the oldest samples from the buffer after pushing it to the cloud. onlineCurveEnding

A subsequent question is whether to send the sample always in the moment it is decided that we keep it or to collect a number of samples before sending them. So one could only send the relevent samples every minute etc. or when a local storage is full.

tguild commented 5 months ago

Hej @eriksven unless there is separate logic to perform an in-vehicle assessment, the curve sampled data point is considered relevant and should be transmitted back to the cloud. It is a lossy compression algorithm that sends pertinent data based on the error thresholds per signal. The new data point may be actionable and should be sent to the cloud in the relatively near future but can be kept in a buffer and sent as part of a batch. Note some of our thresholds could potentially be better tuned, we have some default thresholds for temperature and pressure that can studied and improved.

@yuhanlin-geotab would know better and may want to expound further or correct impressions I have.

yuhanlin-geotab commented 5 months ago

Ted is correct. Points output by the curve algorithm should be sent to the cloud. You may also want to store these points onto some type of persistent storage on the client, so you can upload them later if the network is down.

Yuhan Lin Geotab Embedded Systems Developer

Direct Toll-free

Visit

(416) 891-2416 +1 (877) 431-8221 www.geotab.com

On Fri, Mar 29, 2024 at 11:36 AM tguild @.***> wrote:

Hej @eriksven https://github.com/eriksven unless there is separate logic to perform an in-vehicle assessment, the curve sampled data point is considered relevant and should be transmitted back to the cloud. It is a lossy compression algorithm that sends pertinent data based on the error thresholds per signal. The new data point may be actionable and should be sent to the cloud in the relatively near future but can be kept in a buffer and sent as part of a batch. Note some of our thresholds could potentially be better tuned, we have some default thresholds for temperature and pressure that can studied and improved.

@yuhanlin-geotab https://github.com/yuhanlin-geotab would know better and may want to expound further or correct impressions I have.

— Reply to this email directly, view it on GitHub https://github.com/COVESA/commercial-vehicles/issues/14#issuecomment-2027394669, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DWA7DKMWPVLB5FZLONRKDY2WC7BAVCNFSM6AAAAABFLDBCMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRXGM4TINRWHE . You are receiving this because you were mentioned.Message ID: @.***>

eriksven commented 5 months ago

Thank you for the responses, and I understand that each data point that "survived" the curve sampling should end in the cloud. However, I am not sure what an actual implementation would look like.

I am wondering when which points go into the curve algorithm. All measured data points get curved at some point in time. But as outlined in my previous comment, I see different approaches like:

Based on your answer, I assume you would go for a moving window of n previous data where only previously curved points are kept.

The other question is how immediately one needs to send the curved data to the cloud. I see different options depending on what "relatively near future" means for us:

The second and third option would help to reduce the number of messages sent through the air. Based on your answer, I understand that you would go for the first option since a low latency between the sampling of the data point and arrival in the cloud is one of the requirements of the measurement campaign.

yuhanlin-geotab commented 5 months ago

Regarding how the "raw" points get curved, the PositionCurve type serves as a fixed buffer for points that we want to curve. When the buffer becomes full, the curve algorithm is run on the buffer, the output points are collected, and the buffer is (mostly) emptied. We also have other conditions for running the curve algorithm early.

As for how often we send curved points to the cloud, it depends on your latency requirements, as you said. For our use case, we actually do option

  1. We send all pending curved points to the cloud every 30 seconds.

Yuhan Lin Geotab Embedded Systems Developer

Direct Toll-free

Visit

(416) 891-2416 +1 (877) 431-8221 www.geotab.com

On Tue, Apr 2, 2024 at 4:55 AM Sven Erik Jeroschewski < @.***> wrote:

Thank you for the responses, and I understand that each data point that "survived" the curve sampling should end in the cloud. However, I am not sure what an actual implementation would look like.

I am wondering when which points go into the curve algorithm. All measured data points get curved at some point in time. But as outlined in my previous comment, I see different approaches like:

  • slice the data and apply curve per slice
  • have a moving window/buffer of n previous data points (either all measured data points or just the points that were kept in previous runs)

Based on your answer, I assume you would go for a moving window of n previous data where only previously curved points are kept.

The other question is how immediately one needs to send the curved data to the cloud. I see different options depending on what "relatively near future" means for us:

  • send each curved data point as soon as it is curved (still resulting in many messages but low end-to-end latency)
  • only send data points once n points have been collected
  • send curved and kept data points at regular intervals (e.g., every minute)

The second and third option would help to reduce the number of messages sent through the air. Based on your answer, I understand that you would go for the first option since a low latency between the sampling of the data point and arrival in the cloud is one of the requirements of the measurement campaign.

— Reply to this email directly, view it on GitHub https://github.com/COVESA/commercial-vehicles/issues/14#issuecomment-2031439246, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DWA7ARN6Y7SQB7O2XJLWDY3JW6XAVCNFSM6AAAAABFLDBCMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZRGQZTSMRUGY . You are receiving this because you were mentioned.Message ID: @.***>