asciidoctor / asciidoctorj

:coffee: Java bindings for Asciidoctor. Asciidoctor on the JVM!
http://asciidoctor.org
Apache License 2.0
624 stars 173 forks source link

Fix 1158 concurrent conversion to stream #1172

Closed robertpanzer closed 1 year ago

robertpanzer commented 1 year ago

Thank you for opening a pull request and contributing to AsciidoctorJ!

Please take a bit of time giving some details about your pull request:

Kind of change

Description

What is the goal of this pull request?

Fix the issue reported in #1158. When converting to a stream a new RubyClass is created lazily. This step is not protected with a mutex, so that JRuby can fail. This PR wants to fix this so that this exception does not occur when converting documents in parallel with one Asciidoctor instance.

How does it achieve that?

Eagerly create the class when initializing the Asciidoctor instance.

Are there any alternative ways to implement this?

Yes, the class could be created lazily and a mutex could guarantee mutual exclusion so that the class is created only once.

Are there any implications of this pull request? Anything a user must know?

Issue

If this PR fixes an open issue, please add a line of the form:

Fixes #1158 on the main branch

Release notes

Please add a corresponding entry to the file CHANGELOG.adoc

abelsromero commented 1 year ago

Yes, the class could be created lazily and a mutex could guarantee mutual exclusion so that the class is created only once.

I like the idea of pre-emptive initialization, even when not needed it makes things a lot easier and won't impact conversion performance. But I have a question, ¿are we 100% sure it's not possible to fail during init? I understand not but I am not 100% familiar with all the code. It seems to me that even if someone calls JRubyAsciidoctor.create methods directly, each one ends up calling JRubyAsciidoctor and each one creates and independent jRuby Runtime.

PS: haven't been able to reproduce the original issue (200 threads), I cannot test the fix, but makes sense based on the original issue logs :disappointed:

robertpanzer commented 1 year ago

I could reproduce it locally with 50 threads without this PR on my 10 core Mac.

abelsromero commented 1 year ago

I could reproduce it locally with 50 threads without this PR on my 10 core Mac.

Could this be added as a test to avoid any regression?

Mine doesn't seem to work on Linux, Java17, Ryzen 7 3700X 8-Core(16 threads) :shrug: : Nor converting html or PDF.

package org.asciidoctor.demos;

import org.asciidoctor.Asciidoctor;
import org.asciidoctor.Options;
import org.asciidoctor.SafeMode;
import org.asciidoctor.demos.utils.FileUtils;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;

public class ConcurrentRunner {

    private static final int N_THREADS = 150;
    private static final int TIMEOUT_SECONDS = 120;

    static final ExecutorService executor = Executors.newFixedThreadPool(N_THREADS);
    static final Asciidoctor asciidoctor = Asciidoctor.Factory.create();

    public static void convert() throws IOException {
        var outputStream = new ByteArrayOutputStream();
        String adoc = Files.readString(FileUtils.file("sample-big.adoc").toPath());

        asciidoctor.convert(
            adoc,
            Options.builder()
                .backend("html5")
                .baseDir(FileUtils.file("asciidoc"))
                .headerFooter(true)
                .safe(SafeMode.UNSAFE)
                .toStream(outputStream)
                .build());

        assert outputStream.size() != 0;
    }

    static class ConvertCallable implements Callable<Boolean> {

        @Override
        public Boolean call() {
            try {
                ConcurrentRunner.convert();
            } catch (Exception e) {
                return Boolean.FALSE;
            }
            return Boolean.TRUE;
        }
    }

    public static void main(String[] args) throws IOException, InterruptedException {
        var runner = new ConcurrentRunner();
        runner.convert();

        final List<Callable<Boolean>> callables = new ArrayList<>();
        for (int i = 0; i < N_THREADS * 5; i++) {
            callables.add(new ConvertCallable());
        }

        List<Future<Boolean>> futures = executor.invokeAll(callables, TIMEOUT_SECONDS, TimeUnit.SECONDS);
        executor.shutdown();
        executor.awaitTermination(TIMEOUT_SECONDS, TimeUnit.SECONDS);

        long count = futures.stream()
            .map(future -> {
                try {
                    return future.get(TIMEOUT_SECONDS, TimeUnit.SECONDS);
                } catch (InterruptedException | ExecutionException | TimeoutException e) {
                    throw new RuntimeException(e);
                }
            })
            .filter(result -> result == Boolean.TRUE)
            .count();

        System.out.println("Successful results: " + count);
    }
}
robertpanzer commented 1 year ago

It seems to me that even if someone calls JRubyAsciidoctor.create methods directly, each one ends up calling JRubyAsciidoctor and each one creates and independent jRuby Runtime.

Sorry, did not identify this question while reading initially. Yes, Every JRubyAsciidoctor instance uses its own Ruby instance, which should have its own classes. For example each Asciidoctor instance also has its own Symbols.

I reproduced it with this test:

public class TestParallelToStream {

    @Test
    public void test() throws InterruptedException {
        Asciidoctor asciidoctor = Asciidoctor.Factory.create();

        List<Thread> threads = new ArrayList<>();
        for (int i = 0; i < 50; i++) {

            threads.add(new Thread(() -> {
                String doc = "= Test\n" +
                        "\n" +
                        "== Test\n" +
                        "\n" +
                        "Test";
                ByteArrayOutputStream out = new ByteArrayOutputStream();
                asciidoctor.convert(doc, Options.builder()
                        .backend("pdf")
                        .toStream(out)
                        .build());
                System.out.println("Converted to " + out.toByteArray().length + " bytes");

            }));
        }
        threads.forEach(Thread::start);
        for (Thread thread : threads) {
            thread.join();
        }
    }
}

Maybe the key is using the PDF backend? (For whatever reason.)

abelsromero commented 1 year ago

Maybe the key is using the PDF backend? (For whatever reason.)

Already tried that. But with your version, I reproduce immediately with 3 threads :open_mouth: It must be something about getting the right call the right time. Nevertheless, thanks a lot, let's merge!

mojavelinux commented 1 year ago

The PDF converter loads a lot of dependencies when used, so that makes sense.