Blazebit / blaze-persistence

Rich Criteria API for JPA providers
https://persistence.blazebit.com
Apache License 2.0
741 stars 90 forks source link

Pagination in queries that join on an @OneToMany or @ManyToMany attribute return duplicate results #1770

Open junrrein opened 1 year ago

junrrein commented 1 year ago

Description

Sample code is available at https://github.com/junrrein/blaze-paging-test.
The project structure was generated using the Quickstart instructions, thanks a lot for that!

The Entity structure in the sample is the following: imagen

An Organization has an @OneToMany association to PaymentMethod.
The same bug occurs if this association is @ManyToMany instead.

The data inserted for the test is:

I'm trying to use Blaze to replicate the following JPA query:

SELECT org FROM Organization org
JOIN org.paymentMethods pm
WHERE pm.name = 'Debit card'
ORDER BY org.id

As you can see, I'm just trying to retrieve all Organizations that have an association with the "Debit card" PaymentMethod.

The linked repo has a single test that replicates the same query in 4 ways:

  1. Using plain JPA and TypedQuery.
  2. Same as the previous one, adding pagination using the setFirstResult and setMaxResults methods of TypedQuery.
  3. Using Blaze and CriteriaBuilder.
  4. Reusing the previous one, adding pagination using the page method of CriteriaBuilder.

Expected behavior

All 4 queries should return 1 and only 1 Organization (since no more exist).

Actual behavior

For the paged query when using Blaze, the result list has 2 items, duplicating the same Organization.

Test output (including SQL queries)

https://gist.github.com/junrrein/946e87b69010cde60b10308d9fca92a5

Environment

Version: 1.6.9
JPA-Provider: Hibernate 5.6.15.Final
DBMS: H2 2.2.220
Application Server: Java SE 17 (OpenJDK distributed by Ubuntu)

The environment where I originally detected the issue used the following: Version: 1.6.9 (same one)
JPA-Provider: Hibernate 5.3.7.Final DBMS: PostgreSQL 9.6.24 Application Server: Wildfly 15, running on top of OpenJDK 8

beikov commented 1 year ago

Note that this query doesn't make a lot of sense which is why I'll assign a low priority, but I agree that it should ideally be fixed to skip duplicates i.e. apply grouping or distinct.

The more sensible query would be:

SELECT org 
FROM Organization org
WHERE EXISTS (
    SELECT 1
    FROM org.paymentMethods pm
    WHERE pm.name = 'Debit card'
)
ORDER BY org.id
junrrein commented 1 year ago

Thank you for taking this up and for providing a workaround.

Just wanted to add that the same thing happens if the @OneToMany relationship is @ManyToMany instead.
I'm updating the issue title and description accordingly.

rybandrei2014 commented 5 months ago

@junrrein @beikov Hey I am currently struggling with the same issue. I need to retrieve paginated result with fetch and I am also getting duplicates in result. It happens when I am getting entities that have more than one attribute bound by @ManyToOne relation. Here is the pseudocode:

public class Entity {
    @Id
    private Long id;

    @OneToMany(fetch = FetchType.LAZY, mappedBy = "entity")
    private Set<EntityAttribute> attributes;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(nullable = false, insertable = false, updatable = false)
    private EntityType entityType;
}

public class EntityAttribute {
    @Id
    private Long id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(nullable = false, insertable = false, updatable = false)
    private Entity entity;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(nullable = false, insertable = false, updatable = false)
    private Attribute attribute;
}

@Override
public PagedList<Entity> search(SearchByCodeRequest request) {
        var date = Optional.ofNullable(request.getDate()).orElse(new Date());
        var pageSize = request.getPageSize();
        var firstResult = request.getPage() * pageSize;

        var cb = criteriaBuilderFactory.create(entityManager, Entity.class)
                .fetch("entityType", "attributes", "attributes.attribute")
                .orderByAsc("id")
                .page(firstResult, pageSize)
                .withCountQuery(true)
                .where("validFrom").le(date)
                .where("validTo").ge(date);

        var code = request.getCode();

        if (code != null) {
            cb.where("entityType.code").eq(code);
        }

        var r = cb.getResultList();

        return r;
}

Here is pseudo result I am getting:

{
    "data": [
        {
            "id": 3,
            "entityType": {
                "code": "REGISTER"
            },
            "attributes": [
                {
                    "id": 4,
                    "attribute": {
                        "id": 4
                    }
                },
                {
                    "id": 7,
                    "attribute": {
                        "id": 8
                    }
                },
                {
                    "id": 5,
                    "attribute": {
                        "id": 6
                    }
                },
                {
                    "id": 6,
                    "attribute": {
                        "id": 7
                    }
                }
            ]
        },
        {
            "id": 3,
            "entityType": {
                "code": "REGISTER"
            },
            "attributes": [
                {
                    "id": 4,
                    "attribute": {
                        "id": 4
                    }
                },
                {
                    "id": 7,
                    "attribute": {
                        "id": 8
                    }
                },
                {
                    "id": 5,
                    "attribute": {
                        "id": 6
                    }
                },
                {
                    "id": 6,
                    "attribute": {
                        "id": 7
                    }
                }
            ]
        },
        {
            "id": 3,
            "entityType": {
                "code": "REGISTER"
            },
            "attributes": [
                {
                    "id": 4,
                    "attribute": {
                        "id": 4
                    }
                },
                {
                    "id": 7,
                    "attribute": {
                        "id": 8
                    }
                },
                {
                    "id": 5,
                    "attribute": {
                        "id": 6
                    }
                },
                {
                    "id": 6,
                    "attribute": {
                        "id": 7
                    }
                }
            ]
        },
        {
            "id": 3,
            "entityType": {
                "code": "REGISTER"
            },
            "attributes": [
                {
                    "id": 4,
                    "attribute": {
                        "id": 4
                    }
                },
                {
                    "id": 7,
                    "attribute": {
                        "id": 8
                    }
                },
                {
                    "id": 5,
                    "attribute": {
                        "id": 6
                    }
                },
                {
                    "id": 6,
                    "attribute": {
                        "id": 7
                    }
                }
            ]
        }
    ],
    "page": 1,
    "pageSize": 4,
    "total": 4
}

As you can see I get the same entity with Id = 3 4 times in the result set, due to 4 attributes bound to that entity. How can I remove such duplicates in result set? Thank you in advance

beikov commented 4 months ago

@rybandrei2014 let's continue with your question on #1914