Currently there are O(N*N) updates in postgres when a process consisting on N nodes is executed.
update
nodes
set
definition_id=?,
enter=?,
exit=?,
name=?,
node_id=?,
process_instance_id=?,
type=?
where
id=?
The root cause is an inefficient handling of underlying JPA object. The storage layer is expected to consume a model object (for example ProcessInstance)
To do so, it maps the model to entity in this line
This create a new JPA instance with no tracking of which model property has been modified, so it has to write the whole object again.
Implementation ideas
Do not use ProcessIntance model class in IndexingService, but a DTO interface that is implemented by every persistent layer, allowing performance optimization and avoiding a blind mapping that messed up the JPA bytecode.
Change the existing Storage inteface to make it more flexible
Another possibility, as proposed by @tiagodolphine is to not change any existing interface, but the Postgres DB schema and write everything into a jsonb column. Since it completely changes DB schema, it will make existing data unavailable.
In my opinion, the current storage interface, is seriously limiting the implementation choices. It is basically a cache, therefore extremely favours key value storage options, (specially at the performance side), so my recommendation is to modify current storage interface (replacing map semantic by a business one: addMilestone, addNode....) to give future implementers of other persistence plugins (for example, relational dbs with poor json supprt) enough freedom.
Description
Currently there are O(N*N) updates in postgres when a process consisting on N nodes is executed.
The root cause is an inefficient handling of underlying JPA object. The storage layer is expected to consume a model object (for example ProcessInstance) To do so, it maps the model to entity in this line
This create a new JPA instance with no tracking of which model property has been modified, so it has to write the whole object again.
Implementation ideas
In my opinion, the current storage interface, is seriously limiting the implementation choices. It is basically a cache, therefore extremely favours key value storage options, (specially at the performance side), so my recommendation is to modify current storage interface (replacing map semantic by a business one: addMilestone, addNode....) to give future implementers of other persistence plugins (for example, relational dbs with poor json supprt) enough freedom.