SAP / olingo-jpa-processor-v4

The JPA Processor fills the gap between Olingo V4 and the database, by providing a mapping between JPA metadata and OData metadata, generating queries and supporting the entity manipulations.
Apache License 2.0
122 stars 76 forks source link

$filter counting navigation property is slow #239

Closed wog48 closed 7 months ago

wog48 commented 8 months ago

Requests like GET .../AdministrativeDivisions?$top=1&$filter=Children/$count ge 1 can become slow depending on the amount of data and the choose database.

Such a request would generate a sql query like this:

select distinct t0."ParentCodeID" as a1,
    t0."ParentDivisionCode" as a2,
    t0."AlternativeCode" as a3,
    t0."CountryISOCode" as a4,
    t0."Population" as a5,
    t0."CodeID" as a6,
    t0."Area" as a7,
    t0."DivisionCode" as a8,
    t0."CodePublisher" as a9
from "AdministrativeDivision" t0
where exists ( select t1."CodePublisher"
    from "AdministrativeDivision" t1
    where (((t1."CodePublisher" = t0."CodePublisher")
            and (t1."ParentCodeID" = t0."CodeID"))
            and (t1."ParentDivisionCode" = t0."DivisionCode"))
    group by
        t1."CodePublisher",
        t1."ParentCodeID",
        t1."ParentDivisionCode"
    having (COUNT(t1."DivisionCode") >= 1))
order by
    t0."CodePublisher" asc,
    t0."CodeID" asc,
    t0."DivisionCode" asc
limit 1 offset 0

On some databases this is slow. As an alternative an IN clause instead of the EXISTS could be used:

select distinct t0."ParentCodeID" as a1,
    t0."ParentDivisionCode" as a2,
    t0."AlternativeCode" as a3,
    t0."CountryISOCode" as a4,
    t0."Population" as a5,
    t0."CodeID" as a6,
    t0."Area" as a7,
    t0."DivisionCode" as a8,
    t0."CodePublisher" as a9
from "OLINGO"."AdministrativeDivision" t0
where (t0."CodePublisher", t0."CodeID", t0."DivisionCode") in (
    select t1."CodePublisher",t1."ParentCodeID", t1."ParentDivisionCode"
    from "OLINGO"."AdministrativeDivision" t1
    group by t1."CodePublisher", 
t1."ParentCodeID",
        t1."ParentDivisionCode"
    having (COUNT(t1."DivisionCode") >= 1) )
order by
    t0."CodePublisher" asc,
    t0."CodeID" asc,
    t0."DivisionCode" asc
limit 1 offset 0

This is possible as EclipseLink is now supporting such IN clauses, which were not correctly generated at least till version 2.7.9. The following table shall give an impression of the difference:

Database Number of Rows EXISTS IN
MariaDB ~22000 104 sec 0.031 sec
PostgreSQL ~22000 51.5 sec 0.012 sec
SAP HANA ~71000 0.023 sec 0.027 sec

(All the database run in parallel on my local machine)

Additional remark: Surprisingly AdministrativeDivisions?$filter=Parent/Children/$count eq 2 takes onyl 1.5 sec on PostrgeSQL with EXISTS and takes 0,025 sec with IN

wog48 commented 7 months ago

Solved with 2.0.2. See also https://github.com/SAP/olingo-jpa-processor-v4/blob/main/jpa-tutorial/Questions/WhatIsTheProblemWithInAndExist.adoc