eclipse-ee4j / eclipselink

Eclipselink project
https://eclipse.dev/eclipselink/
Other
199 stars 169 forks source link

EclipseLink is executing updating queries whenever an entity with a Set property is read #1217

Open FabriSr opened 3 years ago

FabriSr commented 3 years ago

When running with the latest EclipseLink 2.7, EclipseLink is failing to track changes when an entity has an embedded collection property mapped using Set's: whenever an entity instance is read from the entity manager update queries are executed against the database despite no entity data has been changed.

For example, an entity:

@Entity
@Table(name = "employee")
@NamedQuery(name = "Employee.findAll", query = "SELECT e FROM Employee e ORDER BY e.id")
public class Employee implements Serializable
{
    .....

    @ElementCollection(fetch = FetchType.EAGER)
    @CollectionTable(
            name = "employee_phone_number",            
            joinColumns = @JoinColumn(name = "employeeId")
    )
    @AttributeOverride(name = "number", column = @Column(name = "number"))
    private Set<PhoneNumber> phoneNumbers;
}

Whenever the EmployeeDAO.readAll method is executed, update queries are executed against database:

@Transactional
@ApplicationScoped
public class EmployeeDAO
{
    @PersistenceContext(name = "jpa-unit")
    private EntityManager entityManager;

    public List<Employee> readAll()
    {        
        return this.entityManager.createNamedQuery("Employee.findAll", Employee.class).getResultList();
    }        
}

Steps to Reproduce I have attached a small project to demonstrate the error (using the latest OpenLiberty release). The project has two mapped entities: Employee and Person. Employee is mapped using Set collection type. Person is mapped using List. Only Employee instances are updated. EmployeeDAO does not change the state of the objects in the readAll method. The "populate" method creates objects only when the table is empty.

1 - Download and unzip the attached demo project error_demo.zip 2 - Navigate into the project root folder and run: $ mvn liberty:dev 3 - Open your brownser: http://localhost:9080/ 4 - Open the jpa log file (./target/liberty/wlp/usr/servers/defaultServer/logs/jpa.log) and search for UpdateObjectQuery 5 - Refresh the page (whenever the page is loaded the readAll operation is executed). New UpdateObjectQuery entries are added to log file (indicating the update operation).

dazey3 commented 3 years ago

@FabriSr

Hello! Thank you for bringing this issue to our attention!

Have you noticed this issue recently in a previously functioning application (regression) or are you investigating a new application currently in development?

My first suggestion (purely for investigation) would be to remove the @AttributeOverride from the Employee @ElementCollection mapping and see if the issue still exists. I suggest this because this example seems very similar to an issue I fixed last year with EclipseLink + (@AttributeOverride on @ElementCollection): https://bugs.eclipse.org/bugs/show_bug.cgi?id=564260

I will download and look at the provided example so that I can attempt to confirm the issue as well. Thanks!

FabriSr commented 3 years ago

@dazey3,

Thanks for your response.

I noticed this error in one of my applications. Once a week, a method was executed to query some entities and insert a new element in one of the collection type properties. Generally, 180 parent instances were involved. The method stopped running due to a memory consumption overflow. I noticed that with each iteration of the loop, all 180 entities along with the collections were updated in the database. The log showed more than 30,000 database update queries were executed.

The error persists on the most recent version of the Open Liberty server (which uses Eclipse Link 2.7.9).

I have removed the @AttributeOverride from the Employee mapping and ran the error demo again. The error persists.

dazey3 commented 3 years ago

@FabriSr

EclipseLink is failing to track changes when an entity has an embedded collection property mapped using Set's: whenever an entity instance is read from the entity manager update queries are executed against the database despite no entity data has been changed.

I'm trying to understand specifically what the expected behavior is and what is instead happening. Is the issue that EclipseLink is executing unnecessary UPDATE queries on Employee despite no entity data having changed?

Running the sample code provided, I see the following actions from EclipseLink. Note that I am running against Derby and I assume this is not a database specific issue. What database are you running against and I will use that as well.

Here we see the DDL generation to create the tables

    CREATE TABLE employee (id INTEGER NOT NULL, name VARCHAR(255), PRIMARY KEY (id))
    CREATE TABLE person (id INTEGER NOT NULL, name VARCHAR(255), PRIMARY KEY (id))
    CREATE TABLE employee_phone_number (number VARCHAR(255), employeeId INTEGER)
    CREATE TABLE person_phone_number (number VARCHAR(255), personId INTEGER)
    ALTER TABLE employee_phone_number ADD CONSTRAINT mplyphnnumbermplyd FOREIGN KEY (employeeId) REFERENCES employee (id)
    ALTER TABLE person_phone_number ADD CONSTRAINT prsnphnnumberprsnd FOREIGN KEY (personId) REFERENCES person (id)

    DELETE FROM SEQUENCE WHERE SEQ_NAME = 'SEQ_GEN'
    SELECT * FROM SEQUENCE WHERE SEQ_NAME = 'SEQ_GEN'
    INSERT INTO SEQUENCE(SEQ_NAME, SEQ_COUNT) values ('SEQ_GEN', 0)

This should be the execution of EmployeeDAO.readAll()

    Execute query ReadAllQuery(name="Employee.findAll" referenceClass=Employee sql="SELECT id, name FROM employee ORDER BY id")
    SELECT id, name FROM employee ORDER BY id

The result set was empty, so this should be the execution of EmployeeDAO.populate()

    persist() operation called on: Employee{id=0, name=Joshua}.
    Execute query DataModifyQuery(name="SEQUENCE" sql="UPDATE SEQUENCE SET SEQ_COUNT = SEQ_COUNT + #PREALLOC_SIZE WHERE SEQ_NAME = #SEQ_NAME")
    UPDATE SEQUENCE SET SEQ_COUNT = SEQ_COUNT + 50 WHERE SEQ_NAME = 'SEQ_GEN'

    Execute query ValueReadQuery(name="SEQUENCE" sql="SELECT SEQ_COUNT FROM SEQUENCE WHERE SEQ_NAME = #SEQ_NAME")
    SELECT SEQ_COUNT FROM SEQUENCE WHERE SEQ_NAME = 'SEQ_GEN'
    assign sequence to the object (1 -> Employee{id=0, name=Joshua})

    persist() operation called on: Employee{id=0, name=Ashley}.
    assign sequence to the object (2 -> Employee{id=0, name=Ashley})

    INSERT INTO employee (id, name) VALUES (1, 'Joshua')
    INSERT INTO employee_phone_number (employeeId, number) VALUES (1, '555-3333')
    INSERT INTO employee_phone_number (employeeId, number) VALUES (1, '555-5241')
    INSERT INTO employee (id, name) VALUES (2, 'Ashley')
    INSERT INTO employee_phone_number (employeeId, number) VALUES (2, '555-09273')
    INSERT INTO employee_phone_number (employeeId, number) VALUES (2, '555-8375')

Running against Person, I see the same set of operations. What is different from my operations compared with what you are seeing?

FabriSr commented 3 years ago

I'm trying to understand specifically what the expected behavior is and what is instead happening. Is the issue that EclipseLink is executing unnecessary UPDATE queries on Employee despite no entity data having changed?

Yes, this is what is happening. Please note that, changing the mapping from Set to List stops the unnecessary updates.

When you refresh the web page of the sample, populate method is not executed (no entity is inserted). Just the "readAll" method is executed. The Employee entity is updated. The Person entity is not updated. The diference between theses entities is that one is mapped using Set and the other is mapped using List.

After you refresh the page of the demo (localhost:9080) please reopen the JPA log file and search for UpdateObjectQuery (this is the log entry generated by Open Liberty).

dazey3 commented 3 years ago

I can't seem to reproduce the problem in a JSE test. With both Employee and Person, I am seeing no UPDATEs being issued (UpdateObjectQuery).

However, running your sample, I am seeing the behavior being reported.

PersonDAO run:

    [eclipselink.query] Execute query ReadAllQuery(name="Person.findAll" referenceClass=Person sql="SELECT id, name FROM person ORDER BY id")
    [eclipselink.connection] Connection acquired from connection pool [read].
    [eclipselink.connection] reconnecting to external connection pool
    [eclipselink.sql] SELECT id, name FROM person ORDER BY id
    [eclipselink.connection] Connection released to connection pool [read].
    [eclipselink.transaction] TX beforeCompletion callback, status=STATUS_ACTIVE
    [eclipselink.transaction] begin unit of work commit
    [eclipselink.transaction] TX afterCompletion callback, status=COMMITTED
    [eclipselink.transaction] end unit of work commit
    ...
    closeTxEntityManager is closing JTA em: org.eclipse.persistence.internal.jpa.EntityManagerImpl@c3b70173

EmployeeDAO run:

    [eclipselink.query] Execute query ReadAllQuery(name="Employee.findAll" referenceClass=Employee sql="SELECT id, name FROM employee ORDER BY id")
    [eclipselink.connection] Connection acquired from connection pool [read].
    [eclipselink.connection] reconnecting to external connection pool
    [eclipselink.sql] SELECT id, name FROM employee ORDER BY id
    [eclipselink.connection] Connection released to connection pool [read].
    [eclipselink.transaction] TX beforeCompletion callback, status=STATUS_ACTIVE
    [eclipselink.transaction] begin unit of work commit
    [eclipselink.transaction] TX beginTransaction, status=STATUS_ACTIVE
    [eclipselink.query] Execute query UpdateObjectQuery(Employee{id=2, name=Ashley})
FabriSr commented 3 years ago

So, the problem resides in Open Liberty?

dazey3 commented 3 years ago

However, I am also noticing that if I rerun the test, sometimes the UpdateObjectQuery objects never get traced at all! It's very odd, I can't quite determine where the problem resides at this time. I have run the test you provided, against OpenLiberty, and I can see that sometimes an UpdateObjectQuery is logged and sometimes there is no such trace. But sometimes there are 3 or 6 UpdateObjectQueries being created...

3 [eclipselink.sql] SELECT id, name FROM employee ORDER BY id
3 [eclipselink.connection] Connection released to connection pool [read].
3 [eclipselink.transaction] TX beforeCompletion callback, status=STATUS_ACTIVE
3 [eclipselink.transaction] begin unit of work commit
3 [eclipselink.transaction] TX beginTransaction, status=STATUS_ACTIVE
3 [eclipselink.query] Execute query UpdateObjectQuery(Employee{id=1, name=Joshua})
> getConstraintsForClass Entry  
                                                                                                               class eclipse_link.error_demo.models.Employee
< getConstraintsForClass Exit  
                                                                                                               BeanDescriptorImpl{class='Employee'}
3 [eclipselink.query] Execute query UpdateObjectQuery(eclipse_link.error_demo.models.PhoneNumber@a8adabf7)
3 [eclipselink.query] Execute query UpdateObjectQuery(eclipse_link.error_demo.models.PhoneNumber@3bcde946)
3 [eclipselink.query] Execute query UpdateObjectQuery(Employee{id=2, name=Ashley})
> getConstraintsForClass Entry  
                                                                                                               class eclipse_link.error_demo.models.Employee
< getConstraintsForClass Exit  
                                                                                                               BeanDescriptorImpl{class='Employee'}
3 [eclipselink.query] Execute query UpdateObjectQuery(eclipse_link.error_demo.models.PhoneNumber@127d0818)
3 [eclipselink.query] Execute query UpdateObjectQuery(eclipse_link.error_demo.models.PhoneNumber@af771975)
3 [eclipselink.transaction] TX afterCompletion callback, status=COMMITTED
3 [eclipselink.transaction] end unit of work commit

What I would say is that I do not see any SQL being executed. Are you seeing UPDATE queries against the database?

dazey3 commented 3 years ago

Looking into EclipseLink existing bugs, perhaps the issue related to this bug reported long ago and never fixed: https://bugs.eclipse.org/bugs/show_bug.cgi?id=335277

FabriSr commented 3 years ago

Looking into EclipseLink existing bugs, perhaps the issue related to this bug reported long ago and never fixed: https://bugs.eclipse.org/bugs/show_bug.cgi?id=335277

It's probably this same bug.

What I would say is that I do not see any SQL being executed. Are you seeing UPDATE queries against the database?

I didn't get to confirm that. I can say that at the time I was working on my application, there was a considerable increase in database activity (PostgreSQL). But I only noticed this failure when my application stopped running reporting memory overflow.

dazey3 commented 3 years ago

Ok, I was able to get a JSE test to fail. My issue was that I was not enclosing the NamedQuery execution within a local transaction (doh). Even though it isn't needed for executing read queries, the application test is none-the-less executing within one:

public List<Employee> employeeFindAll() {
        EntityManager em = emf.createEntityManager();
        try {
            em.getTransaction().begin();
            List<Employee> employees = em.createNamedQuery("Employee.findAll", Employee.class).getResultList();
            em.getTransaction().commit();
            return employees;
        } finally {
            if(em.getTransaction().isActive()) {
                em.getTransaction().rollback();
            }
            if (em.isOpen()) {
                em.close();
            }
        }
    }

Trace:

[EL Finest]: query: 2021-07-19 18:23:01.305--UnitOfWork(-1892959964)--Thread(Thread[main,5,main])--Execute query ReadAllQuery(name="Employee.findAll" referenceClass=Employee sql="SELECT id, name FROM employee ORDER BY id")
[EL Finest]: connection: 2021-07-19 18:23:01.307--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--Connection acquired from connection pool [default].
[EL Fine]: sql: 2021-07-19 18:23:01.307--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--SELECT id, name FROM employee ORDER BY id
[EL Finest]: connection: 2021-07-19 18:23:01.309--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--Connection released to connection pool [default].
[EL Finest]: query: 2021-07-19 18:23:01.313--ServerSession(-1794389458)--Thread(Thread[main,5,main])--Execute query ReadAllQuery(name="phoneNumbers" referenceClass=PhoneNumber sql="SELECT number, employeeId FROM employee_phone_number WHERE (employeeId = ?)")
[EL Finest]: connection: 2021-07-19 18:23:01.313--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--Connection acquired from connection pool [default].
[EL Fine]: sql: 2021-07-19 18:23:01.314--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--SELECT number, employeeId FROM employee_phone_number WHERE (employeeId = 1)
[EL Finest]: connection: 2021-07-19 18:23:01.319--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--Connection released to connection pool [default].
[EL Finest]: query: 2021-07-19 18:23:01.323--ServerSession(-1794389458)--Thread(Thread[main,5,main])--Execute query ReadAllQuery(name="phoneNumbers" referenceClass=PhoneNumber sql="SELECT number, employeeId FROM employee_phone_number WHERE (employeeId = ?)")
[EL Finest]: connection: 2021-07-19 18:23:01.323--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--Connection acquired from connection pool [default].
[EL Fine]: sql: 2021-07-19 18:23:01.324--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--SELECT number, employeeId FROM employee_phone_number WHERE (employeeId = 2)
[EL Finest]: connection: 2021-07-19 18:23:01.328--ServerSession(-1794389458)--Connection(1148470225)--Thread(Thread[main,5,main])--Connection released to connection pool [default].
[EL Finer]: transaction: 2021-07-19 18:23:01.329--UnitOfWork(-1892959964)--Thread(Thread[main,5,main])--begin unit of work commit
[EL Finest]: query: 2021-07-19 18:23:01.333--UnitOfWork(-1892959964)--Thread(Thread[main,5,main])--Execute query UpdateObjectQuery(Employee{id=2, name=Ashley})
[EL Finest]: query: 2021-07-19 18:23:01.334--UnitOfWork(-1892959964)--Thread(Thread[main,5,main])--Execute query UpdateObjectQuery(model.PhoneNumber@58702a7d)
[EL Finest]: query: 2021-07-19 18:23:01.334--UnitOfWork(-1892959964)--Thread(Thread[main,5,main])--Execute query UpdateObjectQuery(model.PhoneNumber@8be2b040)
[EL Finer]: transaction: 2021-07-19 18:23:01.336--UnitOfWork(-1892959964)--Thread(Thread[main,5,main])--end unit of work commit

Also, now that I can run it simply with a JSE test, two things are evident:

  1. This mean the issue is most certainly not OpenLiberty and this is a bug in EclipseLink (most likely https://bugs.eclipse.org/bugs/show_bug.cgi?id=335277)
  2. I can now see that the behavior is non-deterministic. Sometimes there are no UpdateObjectQuery objects traced, and sometimes there are 3-6!

However, I do not see any network activity or SQL actually being run. At this time is this issue simply a report of weird trace being seen? Or perhaps a performance issue? Or are you reporting actual incorrect behavior in your application?

FabriSr commented 3 years ago

However, I do not see any network activity or SQL actually being run. At this time is this issue simply a report of weird trace being seen? Or perhaps a performance issue? Or are you reporting actual incorrect behavior in your application?

Depending on the number of instances consulted within a method, an out of memory error may occur. That's what happened in my application. In my case, a method fetched approximately 180 instances, looped it and added an element to the collection of each instance. Each instance had approximately 20 elements. At each iteration of the loop, ~3780 UpdateObjectQuery was generated. So my application crashes out of memory error. I had to change the entity mapping in order to fix it.

dazey3 commented 3 years ago

Looking into a fix, I started where https://bugs.eclipse.org/bugs/show_bug.cgi?id=335277 pointed; in org.eclipse.persistence.mappings.AggregateCollectionMapping.compareForChange():

    Object cloneIterator = cp.iteratorFor(cloneCollection);
    Object backUpIterator = cp.iteratorFor(backupCollection);
...
    while (cp.hasNext(cloneIterator)) {
    Object cloneObject = cp.next(cloneIterator, session);

    // For CR#2285 assume that if null is added the collection has changed.
    if (cloneObject == null) {
        change = true;
        break;
    }
    Object backUpObject = null;
    if (cp.hasNext(backUpIterator)) {
        backUpObject = cp.next(backUpIterator, session);
    }

The problem is that cloneObject and backUpObject are sometimes different values. They should be different Object references, but being different values is causing the issue.

For instance, in the test provided and stepping through the code, I am seeing that sometimes the values are:

cloneObject PhoneNumber  (id=137)   
    number  "555-8375" (id=142) 
backUpObject    PhoneNumber  (id=140)   
    number  "555-8375" (id=142) 

but other times, they are:

cloneObject PhoneNumber  (id=137)   
    number  "555-8375" (id=144) 
backUpObject    PhoneNumber  (id=140)   
    number  "555-09273" (id=141)    

This second example is where the problem occurs! When they mismatch like this, it causes EclipseLink to think a change was made, generate an UpdateObjectQuery, then realize the mistake later and short circuit the issue. I can see this being a much bigger problem with very large Sets.

The reason this is occurring is because cloneCollection and backupCollection are both java.lang.Set objects. The code assumes that since they are clones, that iterating over both will access the elements in the same order. Except Iterators for Sets have no guaranteed order...

FabriSr commented 3 years ago

This second example is where the problem occurs! When they mismatch like this, it causes EclipseLink to think a change was made, generate an UpdateObjectQuery, then realize the mistake later and short circuit the issue. I can see this being a much bigger problem with very large Sets.

That's why i ran into memory issues.

The reason this is occurring is because cloneCollection and backupCollection are both java.lang.Set objects. The code assumes that since they are clones, that iterating over both will access the elements in the same order. Except Iterators for Sets have no guaranteed order...

This may explain the non-deterministic behavior of the problem.