SAP / olingo-jpa-processor-v4

The JPA Processor fills the gap between Olingo V4 and the database, by providing a mapping between JPA metadata and OData metadata, generating queries and supporting the entity manipulations.
Apache License 2.0
122 stars 78 forks source link

Missing 'distinct' clause in subquery #276

Closed nyordanoff closed 6 months ago

nyordanoff commented 7 months ago

Hello,

I noticed that distinct clause is missing in a subquery which results in many returned duplicate records. The setup is as follows:

I have two entities - SystemInstance:

@Entity(name = "systemInstance")
@Table(name = "tenants_apps")
public class SystemInstanceEntity {
    ...
    @OneToMany(mappedBy = "systemInstance", fetch = FetchType.LAZY)
    private Set<BundleEntity> consumptionBundles;
    ...
}

and Bundle:

@Entity(name = "consumptionBundle")
@Table(name = "tenants_bundles")
public class BundleEntity {
    ...
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "app_id", insertable = false, updatable = false)
    private SystemInstanceEntity systemInstance;
    ...
}

For testing purposes, let's say we have 1 SystemInstance which has 2 Bundles.

When I execute: /systemInstances?$expand=consumptionBundles, these are some of the generated queries:

[EL Fine]: sql: 2024-02-26 14:42:53.903--ServerSession(1700334050)--Connection(79428551)--Thread(Thread[http-nio-8080-exec-3,5,main])--SELECT DISTINCT local_tenant_id, product_type, application_namespace, fa_formation_id, formation_type_id, description, assignment_id, id, system_number, base_url, name FROM tenants_apps WHERE (((target_id = ?) AND ((formation_id = ?) OR (formation_id = ?))) AND (tenant_id = ?))
    bind => [0ac08747-bd76-4697-8a2d-4a8c851c052a, 53d874d3-f80c-4572-9a0d-209d51f5e46f, acb93a87-5b10-4b2d-9c79-ad53454acba1, bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb]
[EL Fine]: sql: 2024-02-26 14:42:53.908--ServerSession(1700334050)--Connection(1141354648)--Thread(Thread[http-nio-8080-exec-3,5,main])--SELECT t0.name, t0.ord_id, t0.description, t0.local_tenant_id, t0.app_id, t0.id, t0.short_description, t0.version, t0.last_update FROM tenants_bundles t0, tenants_apps t1 WHERE (((((t1.target_id = ?) AND ((t1.formation_id = ?) OR (t1.formation_id = ?))) AND (t1.tenant_id = ?)) AND (((t0.formation_id = ?) OR (t0.formation_id = ?)) AND (t0.tenant_id = ?))) AND (t0.app_id = t1.id)) ORDER BY t0.app_id ASC
    bind => [0ac08747-bd76-4697-8a2d-4a8c851c052a, 53d874d3-f80c-4572-9a0d-209d51f5e46f, acb93a87-5b10-4b2d-9c79-ad53454acba1, bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb, 53d874d3-f80c-4572-9a0d-209d51f5e46f, acb93a87-5b10-4b2d-9c79-ad53454acba1, bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb]

We can see that the second query is missing distinct clause. How is this a problem?

Well, in our testing example, tenants_apps have 2 matching records and tenants_bundles have 4 matching records (due to the various where clauses generated from @EdmProtectedBy). Then full combinations will be made which will result in 8 records. That's why in the response from the ODATA query we have 8 consumptionBundles instead of 2. Most of them are duplicates:

{
    "@odata.context": "$metadata#systemInstances(consumptionBundles())",
    "value": [
        {
            "description": null,
            "title": "system-instance-name",
            "id": "0ac08747-bd76-4697-8a2d-4a8c851c052a",
            "consumptionBundles": [
                {
                    "title": "bar",
                    "id": "454b2c49-ee20-4f2c-a1f2-b60c454e6e08"
                },
                {
                    "title": "bar",
                    "id": "454b2c49-ee20-4f2c-a1f2-b60c454e6e08"
                },
                {
                    "title": "foo",
                    "id": "814b2e02-abfa-428f-82b6-1484bb2e4125"
                },
                {
                    "title": "foo",
                    "id": "814b2e02-abfa-428f-82b6-1484bb2e4125"
                },
                {
                    "title": "bar",
                    "id": "454b2c49-ee20-4f2c-a1f2-b60c454e6e08"
                },
                {
                    "title": "bar",
                    "id": "454b2c49-ee20-4f2c-a1f2-b60c454e6e08"
                },
                {
                    "title": "foo",
                    "id": "814b2e02-abfa-428f-82b6-1484bb2e4125"
                },
                {
                    "title": "foo",
                    "id": "814b2e02-abfa-428f-82b6-1484bb2e4125"
                }
            ]
        }
    ]
}

Another interesting observation from the generated queries is that distinct clause is always present if the relation between the entities is modelled with:

    @ElementCollection
    @CollectionTable(name = "correlation_ids_applications", joinColumns = @JoinColumn(name = "application_id"))
    private List<ArrayElement> correlationIds;
[EL Fine]: sql: 2024-02-26 14:33:48.011--ServerSession(1843609566)--Connection(1086777970)--Thread(Thread[http-nio-8080-exec-3,5,main])--SELECT DISTINCT t0.id, t1.value FROM tenants_apps t0, correlation_ids_applications t1 WHERE ((((t0.target_id = ?) AND (t0.formation_id = ?)) AND (t0.tenant_id = ?)) AND (t1.application_id = t0.id)) ORDER BY t0.id ASC
        bind => [eeeeeeee-eeee-eeee-eeee-eeeeeeeeeeee, aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa, 3e64ebae-38b5-46a0-b1ed-9ccee153a0ae]

which made me wondering if the distinct clause is missing in the subqueries only if the relation is modelled with @OneToMany/@ManyToOne/@ManyToMany relations?

As a conclusion, the point of this issue is to figure out why distinct clause is missing from generated subqueries which results in duplicate records in the ODATA response.