apache / hop

Hop Orchestration Platform
https://hop.apache.org/
Apache License 2.0
983 stars 351 forks source link

[Bug]: Text file output - concurrent writes issue #4558

Open dave-csc opened 3 days ago

dave-csc commented 3 days ago

Apache Hop version?

2.10.0

Java version?

17.0.2

Operating system

Linux

What happened?

This might be a very specific scenario, but I'll file it as a bug anyway.

I set up a Text file output transform inside a "mappable" transform, with the purpose of creating a structured log file for what happens in the "mapping" transform. In short, the mapping transform generates the data needed in the log, and then pass those to the sub-pipeline to write them in the log. Hence, the Text file output is set to always write on the same file in "append" mode.

It happens that the parent transform calls the "logger" in multiple places, and sometimes their writes are mixed up in the resulting files, for example:

1596213;2023;30833201;E;DOE;JOHN;0;1596238;2023;30835173;S;MOE;JANE;1;Data correctly sent - HTTP status: 200 - Server response: true
+++ Invalid data provided

whereas the expected output should be:

1596213;2023;30833201;E;DOE;JOHN;0;+++ Invalid data provided
1596238;2023;30835173;S;MOE;JANE;1;Data correctly sent - HTTP status: 200 - Server response: true

I could probably mitigate this with some Blocking transforms here and there, but probably a better option would be checking if the file is already in use before writing (and then wait for its release before actually writing).

Issue Priority

Priority: 3

Issue Component

Component: Pipelines, Component: Transforms

hansva commented 3 days ago

If they are being called from multiple places then these are distinct instances. And have no knowledge of the other instances running at the same time. The only solution will be to call the mapper only once in each pipeline. The same issue can happen when running multiple copies at the same time.