dotnet / docfx

Static site generator for .NET API documentation.
https://dotnet.github.io/docfx/
MIT License
4.05k stars 861 forks source link

[Bug] PDF generation fails when TOC has many bookmarks #9926

Open iti-pawel-wach opened 5 months ago

iti-pawel-wach commented 5 months ago

Describe the bug Hi, I have strange problem with generating PDF when TOC.md contains some links with bookmarks. It works locally on my laptop but when I'm doing the same on Windows Server PDF generation fails.

PDF generation turned off - no error. In generated TOC.html links are working. I've tried TOC.yml instead TOC.md - the same error. Locally on Windows 11 works, but on Windows server - it fails.

To Reproduce Steps to reproduce the behavior:

  1. Create file "article1.md" with following content:

    # header1
    ## header2
    ### header3-1
    ### header3-2
    ### header3-3
    ### header3-4
    ### header3-5
  2. Create "TOC.md" with following content:

    # [header1](article1.md#header1)
    ## [header2](article1.md#header2)
    ### [header3-1](article1.md#header3-1)
    ### [header3-2](article1.md#header3-2)
    ### [header3-3](article1.md#header3-3)
    ### [header3-4](article1.md#header3-4)
    ### [header3-5](article1.md#header3-5)
  3. Run docfx with pdf generation

docfx.json:

{
  "build": {
    "content": [
      {
        "files": ["**/*.{md,yml}"],
        "exclude": ["**.*.pdf"]
      }
    ],
    "resource": [
      {
        "files": ["**/media/**"],
        "exclude": ["**/obj/**", "**/includes/**"]
      }
    ],
    "overwrite": [
      {
        "exclude": ["obj/**", "_site/**"]
      }
    ],
    "dest": "_site",
    "globalMetadata": {
      "_appTitle": "Test",
      "_disableContribution": "true",
      "_enableSearch": true,
      "pdf": true,
      "pdfTocPage": true,
      "pdfFileName": "doc-pdf.pdf"
    },
    "globalMetadataFiles": [],
    "fileMetadataFiles": [],
    "template": [
      "default",
      "modern"
    ],
    "postProcessors": ["ExtractSearchIndex"],
    "markdownEngineName": "markdig",
    "noLangKeyword": false,
    "keepFileLink": false,
    "cleanupCacheHistory": false,
    "disableGitFeatures": false
  }
}
  1. Get error:
    (...)
    XRef map exported.
    Extracting index data from 4 html files
    Content\doc-pdf.pdf: 0%
    InvalidOperationException: Failed to build PDF page []: 
    http://127.0.0.1:58412/Content/article1.html#header3-5
    at void MoveNext() in PdfBuilder.cs:156                                       
    at void MoveNext() in PdfBuilder.cs:238                                       
    at void MoveNext()                                                            
    at async Task CreatePdf(Func<Outline, Uri, Task<byte[]>> printPdf,            
     Func<Outline, int, int, Task<byte[]>> printHeaderFooter, ProgressTask task,
     Uri outlineUrl, Outline outline, string outputPath,                        
     Action<Dictionary<Outline, int>> updatePageNumbers) in PdfBuilder.cs:235   
    at void MoveNext() in PdfBuilder.cs:105                                       
    at void MoveNext()                                                            
    at void MoveNext() in PdfBuilder.cs:98                                        
    at void MoveNext() in Progress.cs:103                                         
    at void MoveNext() in Progress.cs:138                                         
    at async Task<T> RunAsync<T>(Func<Task<T>> func) in DefaultExclusivityMode.cs:
     40                                                                         
    at async Task<T> StartAsync<T>(Func<ProgressContext, Task<T>> action) in      
     Progress.cs:121                                                            
    at async Task StartAsync(Func<ProgressContext, Task> action) in Progress.cs:  
     101                                                                        
    at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:96              
    at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:114             
    at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:114             
    at void <Execute>b__0() in DefaultCommand.cs:53                               
    at int Run(LogOptions options, Action run) in CommandHelper.cs:48             
    at int Execute(CommandContext context, Options options) in DefaultCommand.cs: 
     31                                                                         
    at Task<int> Execute(CommandContext context, CommandSettings settings) in     
     CommandOfT.cs:40                                                           
    at Task<int> Execute(CommandTree leaf, CommandTree tree, CommandContext       
     context, ITypeResolver resolver, IConfiguration configuration) in          
     CommandExecutor.cs:144                                                     
    at async Task<int> Execute(IConfiguration configuration, IEnumerable<string>  
     args) in CommandExecutor.cs:83                                             

    Full log: log.txt

Expected behavior PDF should be generated without any errors/warnings.

Context (please complete the following information):

Additional context

I noticed - when I remove last line in TOC.md - this one:

### [header3-5](article1.md#header3-5)

Everythings works and there is no error:

XRef map exported.
Extracting index data from 4 html files
Content\doc-pdf.pdf: 0%
Content\doc-pdf.pdf: 53%
Content\doc-pdf.pdf: 99%

Build succeeded.

    0 warning(s)
    0 error(s)

Adding any next link with bookmark causes error but adding link without bookmark works. So it looks like there is some problem above 6 links with bookmarks or maybe there is limitation I don't know. Could someone check this issue or tell me how to resolve/workaround that?

filzrev commented 5 months ago

I can also reproduce problems on my environment.

I've confirmed related source code. and it seems occurred when following condition met.

https://github.com/dotnet/docfx/blob/6e116aaa8942efc2fbcbdf856a023fc04aa485e6/src/Docfx.App/PdfBuilder.cs#L144-L156

There is a document when response returns null. https://playwright.dev/docs/api/class-page#page-goto

The method either throws an error or returns a main resource response. The only exceptions are navigation to about:blank or navigation to the same URL with a different hash, which would succeed and return null.