eclipse-kuksa / kuksa-databroker

A modern in-vehicle VSS (Vehicle Signal Specification) server written in RUST
https://eclipse-kuksa.github.io/kuksa-website/
Apache License 2.0
18 stars 17 forks source link

Databroker performance improvements in Get service call #26

Closed rafaeling closed 3 months ago

rafaeling commented 5 months ago

This change means a 8 times faster improvement for the Get method, which drops to 7.8 ms for 5000 requests compared to the main branch implementation, which is 60 ms.

See the wildcard exceptions that have been made obsolete by this change: https://github.com/eclipse-kuksa/kuksa-databroker/blob/9ec27697af5d7594245e5c7cd5ebc2eb687abb80/doc/wildcard_matching.md

codecov-commenter commented 5 months ago

Codecov Report

Attention: Patch coverage is 75.19026% with 163 lines in your changes missing coverage. Please review.

Project coverage is 50.92%. Comparing base (963a297) to head (29a968b).

Files Patch % Lines
databroker/src/grpc/kuksa_val_v1/val.rs 71.42% 82 Missing :warning:
databroker/src/glob.rs 78.04% 81 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #26 +/- ## ========================================== + Coverage 49.40% 50.92% +1.52% ========================================== Files 31 31 Lines 11285 11878 +593 ========================================== + Hits 5575 6049 +474 - Misses 5710 5829 +119 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

SebastianSchildt commented 5 months ago

This may be faster, but it breaks a lot in the existing API as well.

before


Test Client> getValue Vehicle.CurrentLocation 
[
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.X"
    },
    {
        "path": "Vehicle.CurrentLocation.Latitude"
    },
    {
        "path": "Vehicle.CurrentLocation.VerticalAccuracy"
    },
    {
        "path": "Vehicle.CurrentLocation.HorizontalAccuracy"
    },
    {
        "path": "Vehicle.CurrentLocation.Heading"
    },
    {
        "path": "Vehicle.CurrentLocation.Timestamp"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.Z"
    },
    {
        "path": "Vehicle.CurrentLocation.Longitude"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.FixType"
    },
    {
        "path": "Vehicle.CurrentLocation.Altitude"
    },
    {
        "path": "Vehicle.CurrentLocation.GNSSReceiver.MountingPosition.Y"
    }
]

Test Client>       

After

Test Client> getValue Vehi
EXCEPTION of type 'Exception' occurred with message: 'Wrong databroker version, please use a newer version'
To enable full traceback, run the following command: 'set debug true'
Test Client> getValue Vehi
EXCEPTION of type 'Exception' occurred with message: 'Wrong databroker version, please use a newer version'
To enable full traceback, run the following command: 'set debug true'
Test Client> getValue Vehicle.CurrentLocation
{
    "error": {
        "code": 404,
        "reason": "not_found",
        "message": "Path not found"
    },
    "errors": [
        {
            "path": "Vehicle.CurrentLocation",
            "error": {
                "code": 404,
                "reason": "not_found",
                "message": "Path not found"
            }
        }
    ]
}

Test Client> 

(I guess breaking AutoComplete is another side effect of this)

So that would mean a major version bump and losing features

If RegExp (I guess the extending/matching with stars) is the performance bottle neck here, would it not be better, to only use RegExp if needed?

As in

if "*" in path do regexp path
else 
    doSameAsHere() and
          if leaf: return stuff
          if branch "regexp it"
         else "404"

That would just cost a bit of ifs, and I guess the heavy part of the regexps is, that you can't precompile (becasue they come form client). So implementing like above: A client that needs performance ONLY subscribes to Lead nodes and is fine,. Everybody else can knick themselves out

rafaeling commented 5 months ago

This was a hot fix to see if we could get some performance improvements. We are still considering doing more research to keep the wildcard combined with good performance without these PR changes. If the research results are not so promising, there is an additional idea of a new algorithm to partially support the wildcard only for the .* and .** cases while keeping this PR O(1) algorithm. I will change it to Draft as it should not be merged yet.

One more thing regarding auto-completion, yes, it is good to have it and it has been considered to implement an extra service method for Metadata that will support wildcards, which means that the user would not need wildcard support for Get as all signals could be stored locally by the client once in advance.

SebastianSchildt commented 5 months ago

Yes, in terms of an updated/new API I am sure there may be better ways to split/do this

However, for the current API, we should not merge something into main, that changes behavior significantly (I mean this doesn"t reinterpret some unclarities, it would actually remove features contradict what is documented)

"Engineering" wise,I am not sure, but have you looked into whether "globbing" libraries are faster? Like i.e. https://github.com/devongovett/glob-match , I however lack the background to know if this is "fundamentally" faster then the regexp construction of state amchie and machting. And it also wil be a bandaid, because non-wildcard hashmap lookup shall be O(1) (with respect of number of signals, the hashing obviously would be O(l) with l being length of the path), and I doubt the most clever glob matching when backed by a "normal" map/list can ever be faster than O(n)..... But surprise me, if you can deliver O(1) we will probably take it :D

so tldr: I still think now and also in a future APIs ome kind of matching useful, but should be implemented in way that it is only "used when needed"

Off topic here, but since you mentioned it: I guess even when "discoverability" is in a metadata API a user might still want to subscribe to "a branch" (which can be seen as a "wildcard" sort of), otherwise something like the recorder from https://github.com/eclipse-kuksa/kuksa-csv-provider gets really hard, consider you want to generate a trace of everzthing running throug databroker (or a subbranch) but are not able to say "subscribe vehicle", but instead need to list/do 1000 individual paths

rafaeling commented 4 months ago

Trying out the library you posted, will test it and check the performance results.

rafaeling commented 4 months ago

Wildcard now uses glob-matching library https://github.com/devongovett/glob-match Can be reviewed

SebastianSchildt commented 4 months ago

I think, that sutff like

**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this

Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?

Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

rafaeling commented 4 months ago

I think, that sutff like

**.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this

Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere?

Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

SebastianSchildt commented 4 months ago

I think, that sutff like **.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere? Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

Ah, I remember, because "branches" are not really a thing in our internal data structure. In terms of keeping API compatibility, would something be feasible like

inputstr=`Vehicle.Cabin.Sunroof`

res = glob_the_shit_out_of_this(inputStr)

if res=no match
   res = glob_the_shit_out_of_this(inputStr+".**")

It should not really cost anything , when querying direct leaves.

I understand, for any kind of new API we would always use the "best" approch, but this is slightly modifying existing API behavior, so I would prefer to keep behavior changes to a minimum

rafaeling commented 4 months ago

I think, that sutff like **.*.*.*.Position is not supported anymore is fine, does not seem very real world, however is something preventing us to keep this Vehicle.Cabin.Sunroof working as before? I have not checked the implementation, but might something like "if isBranch ad .**" make sense somewhere? Rationale is: the first case I have never used, and find it hard to believe anyone did, the second one I definitely use often at least via CLI

Vehicle.Cabin.Sunroof is not supported anymore, instead Vehicle.Cabin.Sunroof.** or Vehicle.Cabin.Sunroof.* should be use. As far as I understand, there is no way in Databroker to determine if a path is a branch or not; Databroker can only internally query whether a path is contained in the database or not. Previously, it was supported because Regex does not have as many limitations as the glob matching library approach.

Ah, I remember, because "branches" are not really a thing in our internal data structure. In terms of keeping API compatibility, would something be feasible like

inputstr=`Vehicle.Cabin.Sunroof`

res = glob_the_shit_out_of_this(inputStr)

if res=no match
   res = glob_the_shit_out_of_this(inputStr+".**")

It should not really cost anything , when querying direct leaves.

I understand, for any kind of new API we would always use the "best" approch, but this is slightly modifying existing API behavior, so I would prefer to keep behavior changes to a minimum

Implemented your suggestions. Now it is also compatible for branches use cases like Vehicle.Cabin.Sunroof Please test it, meanwhile I will try fix the format errors as soon as possible.

SebastianSchildt commented 4 months ago

@rafaeling sounds good

It would be good, if the conflicts are resolved, because that seems to prevent CI runs, so currently can not lazily test by just pulling the docker image :D

SebastianSchildt commented 4 months ago

Built manually and checked behavior. Looking good to me!