Blazebit / blaze-persistence

Rich Criteria API for JPA providers
https://persistence.blazebit.com
Apache License 2.0
727 stars 85 forks source link

Cannot create own CustomCollectionPersister #1837

Open roma2341 opened 11 months ago

roma2341 commented 11 months ago

Blaze Persistence strictly prohibits using custom persisters like @Persister(mycustompersiter...).

Description

@ElementCollection ignores updatable=false flag. The only way to prevent Hibernate from saving elementCollection into the database is to use a custom persister. Still, Blaze uses some default persisters that block custom @Persister saying that they don't implement CustomCollectionPersister (but my code doesn't have access to this class so I can't implement it).

Expected behavior

We need the ability to control the persistence of collections even if we use blaze persistence.

Actual behavior

It breaks the ability to set custom Hibernate persisters

Environment

Blaze persistence 1.6.10 Spring boot 2.7.17

roma2341 commented 11 months ago

As i see your are using CustomPersisterClassResolver, that's why I can't just pass custom class to @Persister .. hm I also created a ticket about ElementCollection ignores updatable flag, maybe they will fix this issue and I won't need to rewrite CustomPersisterClassResolver

beikov commented 11 months ago

Unfortunately, it's not really viable to allow user defined persisters since the custom persister provided by Blaze-Persistence is necessary to handle some special scenarios involving CTEs. Can you share the Hibernate Jira issue that you created? Also, please share your use case for updatable = false that involves collections. Hibernate ORM has a hard time understanding if an entry in a collection was really "updated" or "replaced", which is why we are thinking about forcing users to have matching insertable and updatable configurations for element collections. I would like to understand your use case a bit better to maybe improve the situation.

roma2341 commented 11 months ago

Ticket: https://hibernate.atlassian.net/jira/software/c/projects/HHH/boards/31?selectedIssue=HHH-17446

We have a ContactList entity and I wanted to retrieve it with the IDs of users on the contact list. But Hibernate writes very slowly to CollectionTable because it makes separate inserts instead of batching. So to have better control of how Hibernate persists user's IDs I wrote separate logic using jdbcTemplate.batchUpdate. However, I noticed that updatable/insertable options don't work for CollectionTable (it always persists changes ) so I had to make fields transient and fetch users' IDs manually to be able to return contact lists with contacts. I know that I can use DTO, but it demands extra effort. It's strange that there is not an adequate way in Hibernate to Map a read-only Collection of ids to entity. I also tried to make JSON columns, but Mysql doesn't index integer collections with more than ~1500 elements.

In any case, it was a bad idea to use CollectionTable, because as I understand it exhausted my ram, because if I have e.g two collection tables in one entity they will generate tons of joins, I could potentially fix this using Blaze views (or broken subselect strategy in Hibernate that doesn't work with entity graphs making it useless), but I already met many bugs implementing my task so don't want to fall deeper into this pit.

roma2341 commented 11 months ago

There is also an interesting article on how someone solved my problem (make collection-table field not persistable) using custom persister, but It will not work with blaze persistence because of "fixed" persister https://blog.termian.dev/posts/jpa-readonly-jointable/

beikov commented 11 months ago

But Hibernate writes very slowly to CollectionTable because it makes separate inserts instead of batching.

I'm not sure about that. Maybe you need to configure insert reordering to make this more efficient through hibernate.order_inserts

However, I noticed that updatable/insertable options don't work for CollectionTable (it always persists changes ) so I had to make fields transient and fetch users' IDs manually to be able to return contact lists with contacts. I know that I can use DTO, but it demands extra effort. It's strange that there is not an adequate way in Hibernate to Map a read-only Collection of ids to entity. I also tried to make JSON columns, but Mysql doesn't index integer collections with more than ~1500 elements.

It's debatable if it makes sense to map join columns for an element collection insertable = false/updatable = false. I understand the desire to map something as read only, but that is something Hibernate ORM has not put a lot of thought into yet. So using a DTO mapping with e.g. Blaze-Persistence Entity-Views is preferrable for read-only.

In any case, it was a bad idea to use CollectionTable, because as I understand it exhausted my ram, because if I have e.g two collection tables in one entity they will generate tons of joins, I could potentially fix this using Blaze views (or broken subselect strategy in Hibernate that doesn't work with entity graphs making it useless), but I already met many bugs implementing my task so don't want to fall deeper into this pit.

Since you seem to be referring to actual entities, I would suggest you to map this as many-to-many mapping or if you're concerned about fetching too much data and don't want to use Entity-Views, you can introduce an entity for the join table and model this as one-to-many mapping e.g.

    @NotAudited
    @OneToMany(mappedBy = "trigger")
    protected Set<TriggerSubscribedUser> subscribedUsers = new HashSet<>();
@Entity
@Table(name = "drip_trigger_subscribers")
public class TriggerSubscribedUser {
    @Id
    @ManyToOne(fetch = LAZY)
    @JoinColumn(name="trigger_id")
    Trigger trigger;
    @Id
    @ManyToOne(fetch = LAZY)
    @JoinColumn(name="subscriber_user_id")
    User user;
}

This way, you will be able to fully benefit from lazy loading and only load ids as long as you don't initialize proxies.

roma2341 commented 11 months ago

I am extremely dissatisfied with the default behavior(speed) of SQL in Spring. This is utterly unacceptable.

Interestingly, I experimented with jdbcTemplate.batchUpdate by enabling rewriteBatchedStatements=true. Surprisingly, this configuration allowed me to insert 16,000 records in just 8 seconds (even with a batch size of 16,000, it still took approximately 8 seconds). Initially, I believed this was the optimal performance, as it aligns with recommendations found online. However, when I manually crafted a multi-value insert query, the execution time drastically improved to just 0.5 seconds.

I was seeking an alternative for JSON columns since they provide a fast way to store data, particularly when working with Hibernate. However, MySQL lacks a GIN index, preventing me from efficiently retrieving all contact lists containing a specific user. Consequently, I had to resort to using CollectionTable, only to find that it performed worse than JSON. It seems that Hibernate does not offer an effective way to map collections, and even Blaze Persistence doesn't alleviate this issue since it relies on Hibernate mappings. Hence, the most efficient approach for me was to designate the "contacts ids" field as transient and manually initialize it when I need to return the data to the front-end. While I understand the importance of using DTOs, it doesn't play a significant role in this particular scenario.

beikov commented 11 months ago

However, when I manually crafted a multi-value insert query, the execution time drastically improved to just 0.5 seconds.

What database is this? I would say that the JDBC driver implementation has bad heuristics as it should do that automatically for batch statements if that really makes such a difference.

I was seeking an alternative for JSON columns since they provide a fast way to store data, particularly when working with Hibernate. However, MySQL lacks a GIN index, preventing me from efficiently retrieving all contact lists containing a specific user. Consequently, I had to resort to using CollectionTable, only to find that it performed worse than JSON. It seems that Hibernate does not offer an effective way to map collections, and even Blaze Persistence doesn't alleviate this issue since it relies on Hibernate mappings

I guess if you want to be able to search rows based on collection contents, it would be best if you use a full text index on MySQL. The use of a collection table is probably the only reasonable alternative that allows somewhat acceptable search performance.

To improve insert performance you should enable JDBC batching either way and if you're using collection table, you also have to enable insert ordering in Hibernate.