MED1: Timestamp check in databroker

lukasmittag commented 5 months ago

This adds a check in validate of Entry to validate timestamps. This means if a timestamp is smaller than the saved one it throws an UpdateError. If no timestamp is provided the databroker uses SystemTime::now() for the timestamp. Because of this behavior we check if the saved timestamp is greater than the provided timestamp. Equal timestamp values are allowed.

erikbosch commented 5 months ago

Should we possibly document our time handling somewhere. I.e. something that describes the two "main" alternatives (Timestamps set by Databroker and/or Timestamps set by provider). I can see some theoretical corner cases that can give problems, like if we have multiple sources/providers for a signal and they have slightly different time. But maybe the recommendation then is either to let broker set the time, or make sure that both sources/providers have synched time, or just recommend to not have multiple providers for the same signals.

lukasmittag commented 5 months ago

Yes, documenting this would be good. We could opt to allow some difference between timestamps. I think for actuation-provider it should be as is, because target value -> last that comes in wins. For providing current values back from a sensor/actuator there should be only one instance/provider.

SebastianSchildt commented 5 months ago

This is maybe more of a general comment, but if we DO allow data providers to explicitely set itmestamp on set, waht is then more likely

We always update timestamp:

an older timestamp comes "later" and overwrites a newer (which likley would be the result of some bug or "really bad"(tm) things happening, that are likely not alleviated by jsut databroker ignoring this)

We do this check

a malicous attacker inserts data wrtt imestamp 2099 basically rendering a datapoint useless until the car is restarted.

So I am wondering what does more damage?

Oe radical "solution" would be never accepting exertnal timestamps, and as databroker is sort of a centralized system, always tag times ourselves. Does this have obvious disadvantages for us?

lukasmittag commented 5 months ago

Valid point @SebastianSchildt but then the timestamp should be rather internally than open through API.

SebastianSchildt commented 5 months ago

Yes maybe, but as it is now, maybe just warning when an older timestamp comes in, but still taking it?

A bit like some build systems do, when they figure out, some "old" artefacts have been build in the future, like make

make: Warning: 'xyz' has modification time in the future
make: Nothing to be done for 'all'
make: warning:  Clock skew detected.  Your build may be incomplete.

maybe that would be the more "robust" approach? Although I do see the report clearly "recommends" the check, just not sure we would agree?

argerus commented 5 months ago

We do this check

a malicous attacker inserts data wrtt imestamp 2099 basically rendering a datapoint useless until the car is restarted.

So I am wondering what does more damage?

That was my immediate thought as well.

It can be solved by adding a validation step that checks that a provided timestamp is not (too long) into the future? And if it is, just set it to now() instead and perhaps provide some feedback to the provider.

The question is if we even need providers to provide timestamps at all in that case?

lukasmittag commented 5 months ago

what John suggeseted can be implemented. I'm just wondering if we are able to emit a warning. For a SetResponse we can only send back errors. So would we throw an error or just not inform the client?

SebastianSchildt commented 5 months ago

So I think we have two cases

A provider not giving a timestamp. In that case we just stamp in databroker (I think that is waht happens with most our current providers? Or are we stamping thsi client side?). I think for most purposes this is the best battern
A provider giving a timestamp, and we believe it, whereas we have two choices
1. Accepting the timestamp, not matter what. It might be in the future, it might be lower than the last stored value, we just take it
2. Accepting the new value if the timestamp is newer than the previous one. If not ignore the update, log a warning

In no case is it a good idea to have some "not too far in the future/past" kind of checks, because there never is a good rationale why a specific value was chosen.

In terms of "attacking"/deliberately putting wrong timestamps, lets agree, that we have much larger problems in the system than can be fixed by "clever timestamp handling in databroker)

I would suggest doing 1. and 2.i

Why? Because as a networking guy by education I can say that "packages overtaking others, leading to reordering" is a bit of a phantom and does't happen nearly as often as used as an argument. Especially in a local network that a car is, with a TCP based protocol. On the other hand, the ECUs in a car are not neccessarily very time synchronized. You may have some AD subsystem synched with µS precision (but not necessarily to the "absolute" real time), whereas other systems don't care much. In a way therefore, for such systems choose option 1. anyway, but if e.g. a connectivity unit feels to jump through time based on some NTP sync, will, we should not panic databroker wise, and just accept the timestamps provided by it as correct from the point of view of the signal source.

argerus commented 5 months ago

I would suggest doing 1. and 2.i

Or in other words:

Keep databroker behaviour exactly as is, without any of the changes in this PR.

I generally I agree with that, but given that some clients might rely on "sane" timestamps, I'm starting to think that we perhaps need to treat provider provided timestamps differently.

Perhaps it would make sense to have databroker always set the timestamp, but make any provider provided timestamp available as a different field (for clients that would find that useful).

In addition, we could also update the provider API to offer a way to set an "offset from now" instead of an absolute timestamp, i.e. this happened "3 ms ago" or this happened "1 day and 23 seconds ago" which could help when the clocks differ between systems.

erikbosch commented 5 months ago

Concerning letting the databroker always set the timestamp - I think we need to discuss what use-cases we want to support in the future. When everything is inside the vehicle and reflects "now" it should not matter that much if the client is allowed to set time or not. But I could see some use-cases where it makes sense to let client se time:

If we are to support scenarios where there is not always connectivity between client and server, i.e. client may send "old" data when it gets connectivity to the Databroker
Replay scenarios
If we are to support persistence and we want the client to manage persistence, like subscribing to some signals and replay them at power-up of Databroker.

SebastianSchildt commented 5 months ago

Or in other words:

Keep databroker behaviour exactly as is, without any of the changes in this PR.

Good summary 😃

maybe with the exception of logging timestamps that seem to go "backwards", to hint at a potential problem.

@erikbosch my understanding is, that currently we do support such use cases (i.e. a client can set timestamp if it wishes to) So that would also be safe, if we do not touch anything

erikbosch commented 5 months ago

So the conclusion is that we will not discard any signal value, at most we may give warnings. That is totally OK for me, but we should preferably document the behavior somewhere, possibly in one of:

Maybe as part of some "architecture page" somewhere

argerus commented 4 months ago

So the conclusion is that we will not discard any signal value

Sure, but I'm still not sure providers should be able to set the timestamp directly. It seems more robust to instead make that information (if provided) available in a separate field. Something like:

{
  path: "Vehicle.Speed",
  value: Float(23.2),
  timestamp: 2024-02-06T13:12:16.576661Z,
  source_timestamp: Some(2024-02-06T13:02:11.229533Z),
  ...
}

That means timestamp will always reflects the time at which the value arrived in databroker without any confusion caused by the timestamp provided.

And it would preserve the timestamp provided by the provider, so even if their clock is way off, comparisons between subsequent source_timestamps would still make sense. And if a provider decides to put some weird timestamp in there, it would not be confused with the actual timestamp.

We don't even have to change the "provider interface" for this; seting the timestamp would just mean that databroker puts that in source_timestamp and always puts the actual time in timestamp. Both would then be available when geting or subscribeing to that signal (if the client chooses to.

lukasmittag commented 4 months ago

Added source timestamp in databroker code.

import random
import time
import datetime

from kuksa_client.grpc import Datapoint
from kuksa_client.grpc import DataEntry
from kuksa_client.grpc import DataType
from kuksa_client.grpc import EntryUpdate
from kuksa_client.grpc import Field
from kuksa_client.grpc import Metadata
from kuksa_client.grpc import VSSClient

def read_speed_from_can_bus():
    """Stub of a real function that would read vehicle speed from CAN bus."""
    time.sleep(0.1)
    return random.randrange(250)

with VSSClient('127.0.0.1', 55555) as client:
    while True:
        new_speed = read_speed_from_can_bus()
        updates = (EntryUpdate(DataEntry(
            'Vehicle.Speed',
            value=Datapoint(value=new_speed,timestamp=datetime.datetime(2020, 5, 17)),
            metadata=Metadata(data_type=DataType.FLOAT),
        ), (Field.VALUE,)),)
        client.set(updates=updates)

tested with this that now timestamp is SystemTime. Currently the client does not get back the source timestamp since this would need a API change.

erikbosch commented 4 months ago

@lukasmittag - some build errors, can you take a look at them?

lukasmittag commented 4 months ago

will add a documentation file shortly

eclipse / kuksa.val

MED1: Timestamp check in databroker #732