FraunhoferIOSB / FROST-Server

A Complete Server implementation of the OGC SensorThings API
https://fraunhoferiosb.github.io/FROST-Server/
GNU Lesser General Public License v3.0
191 stars 67 forks source link

Query performance issues on Observations with plugins_coreModel_idType LONG #1923

Open DanielBertocci opened 2 months ago

DanielBertocci commented 2 months ago

Issue Description

I use plugins_coreModel_idType set to LONG for the system and FROST correctly creates tables with IDs bigint. I recently faced a performance issue on query observations, the system turned to be really slow, it could be one of the new updates of FROST but I am not sure.

What causes the issue

I have enabled query logging and I identified the query causing the issue. When observations are queried with the query Observations?$filter=Datastream/@iot.id eq '1' FROST generates the following query:

select "e0"."RESULT_BOOLEAN", "e0"."RESULT_QUALITY", "e0"."PHENOMENON_TIME_START", "e0"."PARAMETERS", "e0"."DATASTREAM_ID", "e0"."RESULT_STRING", "e0"."RESULT_TYPE", "e0"."VALID_TIME_END", "e0"."PHENOMENON_TIME_END", "e0"."FEATURE_ID", "e0"."ID", "e0"."RESULT_JSON", "e0"."RESULT_TIME", "e0"."RESULT_NUMBER", "e0"."VALID_TIME_START"
from "OBSERVATIONS" as "e0"
left outer join "DATASTREAMS" as "e1" on "e1"."ID" = "e0"."DATASTREAM_ID" where cast("e0"."DATASTREAM_ID" as varchar) = $1
order by "e0"."PHENOMENON_TIME_START" asc, "e0"."PHENOMENON_TIME_END" asc, "e0"."ID" asc offset $2 rows fetch next $3 rows only

The key of the issue is

cast("e0"."DATASTREAM_ID" as varchar) = $1

Using plugins_coreModel_idType set to LONG, DATASTREAM_ID is a bigint and this cast messes up with the indexing and partitioning in the table, I imagine it cannot perform a proper comparison because of the different type. If I remove that cast, the query becomes more than 20 times faster (4 seconds -> 0.2 seconds)

Expected behavior:

What I would except is that this cast is dependent on the setting of idType, and in my specific case the cast to bigint.

How to reproduce

I have used the docker compose on FROST scripts with plugins_coreModel_idType set to LONG.

docker-compose.yaml
  version: "3"
  services:
    web:
      image: fraunhoferiosb/frost-server:latest
      environment:
        - serviceRootUrl=http://localhost:8080/FROST-Server
        - plugins_multiDatastream.enable=false
        - http_cors_enable=true
        - http_cors_allowed_origins=*
        - persistence_db_driver=org.postgresql.Driver
        - persistence_db_url=jdbc:postgresql://database:5432/sensorthings
        - persistence_db_username=sensorthings
        - persistence_db_password=ChangeMe
        - persistence_autoUpdateDatabase=true
        - plugins_coreModel_idType=LONG
      ports:
        - 8080:8080
        - 1883:1883
      depends_on:
        database:
          condition: service_healthy
    database:
      image: postgis/postgis:14-3.2-alpine
      environment:
        - POSTGRES_DB=sensorthings
        - POSTGRES_USER=sensorthings
        - POSTGRES_PASSWORD=ChangeMe
      command: ["postgres", "-c", "log_statement=all"]
      volumes:
        - postgis_volume:/var/lib/postgresql/data
      healthcheck:
        test: ["CMD-SHELL", "pg_isready -d sensorthings -U sensorthings "]
        interval: 10s
        timeout: 5s
        retries: 5
  volumes:
    postgis_volume:

Additional notes

I see also a JOIN that is not needed, only with $expand and accessing nested fields but it is not the case for the proposed query, I wonder if there is a way to optimize that too.

hylkevds commented 2 months ago

Yes, if you do @iot.id eq '1' you explicitly tell FROST to compare the id with a string, so it has to cast the id to a string. You should use @iot.id eq 1

Or, even simpler, just GET v1.1/Datastreams(1)/Observations

I have done some experiments with that join, and it makes no significant difference in recent versions of PostgreSQL. Though recent versions of FROST will try to optimise it out.

DanielBertocci commented 2 months ago

First of all, thank you because you gave me an immediate way to solve the issue we had.

This issue was really hard to find out, it needed some steps to realize that with that query, we didn't take advantage of the indexing.

I wonder if FROST could take special care about IDs, especially with plugins_coreModel_idType set: it knows what is the expected type for the IDs and I would expect the cast of the query parameter to the idType more than the current behavior. A more explicit behavior could be to throw an error, claiming that it is not possible to use a string value when the id is bigint.

I see a risk that, for a matter of quotes in OGC STA query, FROST doesn't take of the advantage of the DB optimizations.

What do you think?

hylkevds commented 2 months ago

This problem is not limited to ID fields, it can happen any time a comparison is made between different types. One of them will have to be cast to the other, and since the only cast that is guaranteed to succeed is the cast to string, that's what FROST does. The same thing happens when filtering on one of the times, and quoting the time-constant in the URL.

The fact that in this specific case that means an index can't be used is something FROST doesn't know, since FROST doesn't know which fields have indices.

In theory, since the "other" field is a constant, FROST could try to see if that string constant could be converted to the non-string type, before taking the route that is guaranteed to work. That may be worth looking into... But this will lead to confusing behaviour when comparing result values to a quoted number, since result has a string-representation, so no casting is needed. Thus comparing IDs to a falsely quoted number (using gt or lt) would result in the behaviour that the user expects, but comparing results to a falsely quoted number would not... For IDs '2' would be smaller than 11, but for results '2' would be larger than 11...