Netflix / conductor-community

Apache License 2.0
61 stars 72 forks source link

Dynamic fork join with subworkflow does not work properly in v3.13.8 with postgres indexing #282

Open dcore94 opened 11 months ago

dcore94 commented 11 months ago

Describe the bug Running v3.13.8 with postgres indexing as show in the attached properties I start getting a task duplication error as show in the screenshot. The workflow termiates in error but in a completely non predictable way. Few task succed, few terminate with strange error condition, other are canceled.

Details Conductor version: 3.13.8 (community edition) Persistence implementation: Postgres Queue implementation: Postgres Workflow definition: As attached in workflows.zip Task definition: Event handler definition:

To Reproduce Steps to reproduce the behavior:

  1. Run roo tworkflow in Workbench and set input to 30

Expected behavior Should call 30 times the subworkflow and terminate correctly.

Screenshots Screenshot from 2023-10-18 17-55-18

workflows.zip

Additional context

The property file

# Database persistence type.
conductor.db.type=postgres

spring.datasource.url=jdbc:postgresql://postgres:5432/conductor
spring.datasource.username=conductor
spring.datasource.password=conductor

# Hikari pool sizes are -1 by default and prevent startup
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=2

# Use Postgres for indexing
conductor.indexing.enabled=true
conductor.indexing.type=postgres
conductor.elasticsearch.version=postgres

#Enable Prometheus
conductor.metrics-prometheus.enabled=true
management.endpoints.web.exposure.include=prometheus,health,info,metrics

# GRPC disabled
conductor.grpc-server.enabled=false

# Load sample kitchen sink disabled 
loadSample=false
dcore94 commented 10 months ago

I'm really getting in trouble with this issue. Can anyone please give some feedback? Can anyone at least tell if the relation to https://github.com/Netflix/conductor/pull/3836 is correct? Sorry for pressing but I'm really unable to understand how a project like Conductor can keep one such dangerous issue without any feedback for more than one month :-(