eclipse-ee4j / yasson

Eclipse Yasson project
https://projects.eclipse.org/projects/ee4j.yasson
Other
198 stars 94 forks source link

Multithreading leads to file corruption #637

Open bmarwell opened 5 months ago

bmarwell commented 5 months ago

Describe the bug

When writing JSON files in parallel with multiple threads (>>100), JSON files are very likely (>80%) to get corrupted like so:

{ 
  "results": [
    { "key": "val]}
}

Sometimes even data from other files is included randomly, again invalidating the JSON file. This does not happen with FasterXML Jackson nor with Apache Johnzon (the latter also implementing Jakarta JSON-B).

To Reproduce

Here's working code with Jackson:

public class PatchJsonWriter implements AutoCloseable {

    private final Path outputDir;
    private final ExecutorService executorService;

    private final ObjectMapper om;

    public PatchJsonWriter(Path outputDir, ExecutorService executorService) {
        this.outputDir = outputDir;
        this.executorService = executorService;

        this.om = new ObjectMapper().findAndRegisterModules().setDefaultPrettyPrinter(new MinimalPrettyPrinter());
    }

    public void writeParallel(Collection<PatchParameterInfo> parameterInfos) {
        List<CompletableFuture<Void>> futures =
                parameterInfos.stream().map(this::writeAsync).toList();

        CompletableFuture.allOf(futures.toArray(CompletableFuture[]::new)).join();
    }

    private CompletableFuture<Void> writeAsync(PatchParameterInfo patchParameterInfo) {
        return CompletableFuture.runAsync(() -> doWritePatchParameterInfo(patchParameterInfo), this.executorService);
    }

    private void doWritePatchParameterInfo(PatchParameterInfo patchParameterInfo) {
        Path path = this.outputDir
                .resolve(patchParameterInfo.groupName())
                .resolve(patchParameterInfo.appNameName())
                .resolve(patchParameterInfo.version());
        Path file = path.resolve("patch.json");

        try {
            Files.createDirectories(path);
        } catch (IOException ioException) {
            throw new UncheckedIOException(ioException);
        }

        Logger logger = Logger.getLogger(PatchJsonWriter.class.getName());
        PatchParameterWrapper parameterWrapper = PatchParameterWrapper.from(patchParameterInfo.patchParameter());
        logger.info(() -> String.format(Locale.ROOT, "Writing file [%s] with content [%s]", file, parameterWrapper));

        try (OutputStream fos = Files.newOutputStream(
                file, StandardOpenOption.CREATE, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING)) {
            this.om.writeValue(fos, parameterWrapper);
        } catch (IOException ioException) {
            throw new UncheckedIOException(ioException);
        } catch (Exception ex) {
            throw new RuntimeException("Problem creating Jsonb instance", ex);
        }
    }

    @Override
    public void close() {}
}

This is the version I created with the latest Yasson version (3.0.4 -- where are the release notes btw?):

public class PatchJsonWriter implements AutoCloseable {

    private final Path outputDir;
    private final ExecutorService executorService;
    private final Jsonb jsonb;

    public PatchJsonWriter(Path outputDir, ExecutorService executorService) {
        this.outputDir = outputDir;
        this.executorService = executorService;

        this.jsonb = JsonbFactory.create();
    }

    // omit sne methods

    private void doWritePatchParameterInfo(PatchParameterInfo patchParameterInfo) {
        // ... same

        try (OutputStream fos = Files.newOutputStream(
                file, StandardOpenOption.CREATE, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING)) {
            this.jsonb.toJson(PatchParameterWrapper.from(patchParameterInfo.patchParameter()), fos);
        } catch (IOException ioException) {
            throw new UncheckedIOException(ioException);
        }
    }

    @Override
    public void close() throws IOException {
        try {
            this.jsonb.close();
        } catch (IOException ioException) {
            throw ioException;
        } catch (Exception other) {
            throw new UnsupportedOperationException(other);
        }
    }
}

Using this code, some files are corrupted like shown above:

{
  "result": [
    {
      "name": "app1.service.url"
    },
    {
      "name": "app1.service.ttl"
    },
    {
      "nawr": "app1.jwt.expiration"
    },
    {
      "namwrc."app1.ldap.searchbase"
        },
            "name"ot"app1.ldap.selfservice_searchba,
      n"  "erap.alias"
        },
        {
            "name": "otherappport"
        }
    ]
} "app1.jwt.encryptionKeySource"
        },
        }
    ]
}

The Jackson variant does not suffer from this behaviour. Yasson is mixing in contents from another file into this example file

Expected behaviour

System information:

Additional context

rdehuyss commented 5 months ago

We from JobRunr have the same issue. We are using Yasson as an option to generate JSON from some objects and sometimes it generates invalid JSON - this is with really limited parallelism (about 2 simultaneous requests).

The invalid JSON:

[{"deleteSucceededJobsAfter":129600.000000000,"firstHeartbeat":"2024-01-16T08:38:09.971713Z","id":"6e836e6f-3bea-4dc8-9d90-804ce2d3f76c","lastHeartbeat":"2024-01-16T08:38:39.994807Z","name":"Ismailas-MacBook-Pro.local","permanent,{"type":"severe-jobrunr-exception"0000000,"pollIntervalInSeconds":15,"processAllocatedMemory":18048136,"processCpuLoad":0.0020519469633203924,"processFreeMemory":17161821048,"processMaxMemory":17179869184,"running":true,"systemCpuLoad":0.2227047146401985,"systemFreeMemory":20553973760,"systemTotalMemory":68719476736,"workerPoolSize":96}]

Notice the permanent where invalid json is generated. Below is the valid json:

[{"deleteSucceededJobsAfter":129600.000000000,"firstHeartbeat":"2024-01-16T08:38:09.971713Z","id":"6e836e6f-3bea-4dc8-9d90-804ce2d3f76c","lastHeartbeat":"2024-01-16T08:41:40.032196Z","name":"Ismailas-MacBook-Pro.local","permanentlyDeleteDeletedJobsAfter":259200.000000000,"pollIntervalInSeconds":15,"processAllocatedMemory":48091728,"processCpuLoad":4.15962064712972E-4,"processFreeMemory":17131777456,"processMaxMemory":17179869184,"running":true,"systemCpuLoad":0.17733182589033353,"systemFreeMemory":20528300032,"systemTotalMemory":68719476736,"workerPoolSize":96}]
bmarwell commented 1 month ago

@rdehuyss for me I found that I was actually writing to the same file twice. Switching to Apache Johnzon only fixed it because it was way faster. For me, this is closed. But I can leave it open for you.

rdehuyss commented 1 month ago

@bmarwell : these were two different Strings (so not even files).