ggrossetie / asciidoctor-web-pdf

Convert AsciiDoc documents to PDF using web technologies
https://asciidoctor.org
MIT License
445 stars 92 forks source link

Performance issue: asciidoctor-web-pdf is almost 3 times slower than asciidoctor-pdf #533

Closed vishalkrsinha closed 2 years ago

vishalkrsinha commented 3 years ago

531 - Using the same steps, I see asciidoctor-web-pdf is almost 3 times slower than asciidoctor-pdf. Is it correct for in general or I need to do something to improve. Kindly suggest.

PS:

  1. I am running this for around 400 adoc files to convert in pdf. While using 'asciidoctor-pdf', it use to take 13 mins but 'asciidoctor-web-pdf ' takes 49 min.
  2. I am trying asciidoctor-web-pdf for better support for 'latex equations'.
ggrossetie commented 3 years ago

asciidoctor-web-pdf is expected to be slower because it starts an headless browser whereas asciidoctor-pdf use a pure Ruby library to create a PDF.

A "batch mode" would probably improve the performance but nobody has been working on it: https://github.com/Mogztter/asciidoctor-web-pdf/issues/99

vishalkrsinha commented 3 years ago

asciidoctor-web-pdf is expected to be slower because it starts an headless browser whereas asciidoctor-pdf use a pure Ruby library to create a PDF.

A "batch mode" would probably improve the performance but nobody has been working on it: #99

Not very sure, what/how to use 'batch mode' in my case. Pasting my minimal script used in azure pipeline. Kindly suggest, by any means I can apply 'batch mode' to improve the overall conversion process?

      - task: PowerShell@2
        displayName: Generating pdf files
        inputs:
          targetType: 'inline'
          script: |
            # Write your PowerShell commands here.
            $adocFiles = (Get-Content -Path $(Build.SourcesDirectory)/ReleaseNotes/adocFiles.json -raw) |  ConvertFrom-Json

             foreach($mod in $adocFiles.modules)
             {
                 foreach($adocFile in $mod.source)
                 {
                     Write-Host 'module adocFile:' $adocFile

                     $noext = $([System.IO.Path]::GetFileNameWithoutExtension($adocFile))
                     $dest = "$(targetDir)" + $mod.destination + "$noext" +".pdf"
                     $src = "$(sourceDir)" + "$adocFile"

                     asciidoctor-web-pdf $src -o $dest -a stem --trace 

                     Write-Host 'Module pdf file:' $dest
                 }
             }
ggrossetie commented 3 years ago

The batch mode (not implemented) will use the same Chrome headless instance. So instead of launching a new Chrome headless instance per document, we will reuse the same instance.

What you will need to do is to pass a list documents:

asciidoctor-web-pdf foo.adoc bar.adoc baz.adoc

But again the batch mode feature is not implemented. Having said that, you can still do this change since you will run less commands but overall the performance should be the same (or a bit better).

vishalkrsinha commented 3 years ago

The batch mode (not implemented) will use the same Chrome headless instance. So instead of launching a new Chrome headless instance per document, we will reuse the same instance.

What you will need to do is to pass a list documents:

asciidoctor-web-pdf foo.adoc bar.adoc baz.adoc

Don't think it'll be feasible to write 400 file names with different paths as suggested way. Some other way around would be good?

vishalkrsinha commented 3 years ago

Any update please?

ggrossetie commented 3 years ago

Any update please?

Please don't do that, this is an open source projet, nobody is paid to answer within 48 hours. If nobody gives update then there's no update.

vishalkrsinha commented 3 years ago

Any update please?

Please don't do that, this is an open source projet, nobody is paid to answer within 48 hours. If nobody gives update then there's no update.

Yes, you are correct. I shouldn't have done that. Will take care from next time. Apologies.

ggrossetie commented 3 years ago

Yes, you are correct. I shouldn't have done that. Will take care from next time. Apologies.

No worries and thanks for acknowledging the problem.

Don't think it'll be feasible to write 400 file names with different paths as suggested way. Some other way around would be good?

Indeed, with this approach you will configure one output directory:

asciidoctor-web-pdf one.adoc two.adoc three.adoc --destination-dir=/path/to/destination

I'm not a PowerShell expert but it seems that you have one destination folder per module, right?

foreach($mod in $adocFiles.modules)
{
  $dest = "$(targetDir)" + $mod.destination
  $files = $mod.source | % {join-path "$(sourceDir)" $_ }
  $filesList = $files -join ' '

  asciidoctor-web-pdf $filesList --destination-dir=$dest -a stem --trace 
}

So instead of running asciidoctor-web-pdf 400 times, you will run it once per module. Please note that Windows has a command line limit of 8192 characters: https://docs.microsoft.com/en-us/troubleshoot/windows-client/shell-experience/command-line-string-limitation

Not sure if PowerShell also has this limitation but you might need to find a workaround.