dotnet / docfx

Static site generator for .NET API documentation.
https://dotnet.github.io/docfx/
MIT License
4.09k stars 868 forks source link

docfx pdf does not generate api docs #8148

Closed groogiam closed 1 year ago

groogiam commented 2 years ago

Operating System: (Windows or Linux or MacOS) Windows 10 DocFX Version Used: 2.59.3.0

Template used: (default or statictoc or contain custom template) Default Steps to Reproduce:

  1. Setup api docs metadata as defined in https://dotnet.github.io/docfx/tutorial/walkthrough/walkthrough_create_a_docfx_project_2.html
  2. Run docfx pdf docfx.json

Expected Behavior:

Pdf should generate without error.

Actual Behavior:

I get an error saying api\toc.yml does not exist and the pdf contains no api documentation.

Running docfx build docfx.json generates the api docs.

paulushub commented 2 years ago

Setup api docs metadata as defined in...

You are referencing Walkthrough Part II: Adding API Documentation to the Website, the pdf tutorial is in Walkthrough Part III: Generate PDF Documentation.

If you still have any problem, post your docfx.json file.

groogiam commented 2 years ago

My docfx.json file is below.

This looks like an environmental issue where the metadata target cannot read the project file on my local machine. I get a bunch of warnings just like this.

with message: Method not found: 'System.ReadOnlySpan1 Microsoft.IO.Path.GetFileName(System.ReadOnlySpan1<Char>)'.

Is there a dependency I am missing on my local machine for extracting metadata. The machine has the .NET 6 SKD and Visual Studio 2022?

On my build server the metadata is generated but the pdf generation fails with

[22-08-27 08:35:07.095]Error:[PdfCommand.PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.AggregateException: One or more errors occurred. ---> iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetPartialPdfModels(IList`1 htmlFilePaths)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.ConvertOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
   at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
---> (Inner Exception #0) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
{
  "metadata": [
    {
      "src": [
        {
          "src": "../src",
          "files": [
            "MyProject.Api/**.csproj",
            "MyProject.Web.Ui/**.csproj",
            "MyProject.Server/**.csproj"
          ]
        }
      ],
      "dest": "api",
      "disableGitFeatures": false,
      "filter": "apiFilterConfig.yml"
    }
  ],
  "build": {
    "content": [
      {
        "files": [
          "api/**.yml",
          "api/index.md"
        ]
      },
      {
        "files": [
          "user_guide/**",
          "user_guide/installation/**",
          "toc.yml",
          "*.md",
          "rest_api/**"
        ]
      }
    ],
    "resource": [
      {
        "files": [
          "images/**",
          "**/media/**"
        ]
      }
    ],
    "overwrite": [
      {
        "files": [
          "apidoc/**.md"
        ],
        "exclude": [
          "obj/**",
          "_site/**"
        ]
      }
    ],
    "globalMetadata": {
      "_appLogoPath": "images/logo.png",
      "_appFaviconPath": "images/favicon.ico",
      "_enableSearch": true,
      "_enableNewTab": true,
      "_disableContribution": true
    },
    "dest": "_site",
    "globalMetadataFiles": [],
    "fileMetadataFiles": [],
    "template": [
      "default",
      "template"
    ],
    "postProcessors": [],
    "markdownEngineName": "markdig",
    "noLangKeyword": false,
    "keepFileLink": false,
    "cleanupCacheHistory": false,
    "disableGitFeatures": false
  },
  "pdf": {
    "content": [
      {
        "files": [
          "api/**.yml",
          "api/index.md"
        ],
        "exclude": [
          "**/toc.yml",
          "**/toc.md"
        ]
      },
      {
        "files": [
          "user_guide/**",
          "user_guide/installation/**",
          "toc.yml",
          "*.md",
          "rest_api/**",
          "pdf/*"
        ],
        "exclude": [
          "**/bin/**",
          "**/obj/**",
          "_site_pdf/**",
          "**/toc.yml",
          "**/toc.md"
        ]
      },
      {
        "files": "pdf/toc.yml"
      }
    ],
    "resource": [
      {
        "files": [
          "images/**",
          "**/media/**"
        ],
        "exclude": [
          "**/bin/**",
          "**/obj/**",
          "_site_pdf/**"
        ]
      }
    ],
    "overwrite": [
      {
        "files": [
          "apidoc/**.md"
        ],
        "exclude": [
          "**/bin/**",
          "**/obj/**",
          "_site_pdf/**"
        ]
      }
    ],
    "wkhtmltopdf": {
      "additionalArguments": "--enable-local-file-access"
    },
    "dest": "_site_pdf",
    "template": [
      "pdf.default",
      "template"
    ]
  }
}
paulushub commented 2 years ago

with message: Method not found: 'System.ReadOnlySpan1 Microsoft.IO.Path.GetFileName(System.ReadOnlySpan1)'.

This is a reported issue and not yet resolved. There are, however, some tips to work around - see if any of these will help your setup.

groogiam commented 2 years ago

@paulushub Thanks. I got some time to do some more research and it seems like these are the relevant issues.

Error On Build Server: PDF header signature not found (Error happen when conversion toc.json to Pdf) · Issue #4999 · dotnet/docfx · GitHub PDF Build fails in Azure DevOps · Issue #4488 · dotnet/docfx · GitHub

Error Locally https://github.com/dotnet/docfx/issues/8143 https://github.com/dotnet/docfx/issues/8102 Work Around Locally https://github.com/dotnet/docfx/issues/8136#issuecomment-1219512721

paulushub commented 2 years ago

@groogiam With the issue fixed, how about the PDF output?

groogiam commented 2 years ago

@paulushub I can generate the pdf output from the command line from my local machine but it still seems to fail on an azure devops agent. Even with "noStdin": true

The metadata generation looks like it is working again though and generating pdf output.

paulushub commented 2 years ago

but it still seems to fail on an azure devops agent. Even with "noStdin": true

Any error messages?

groogiam commented 2 years ago

@paulushub

This error happens both locally and in devops when running with "noStdin": true

workflow.html" - has exception, the details: The filename or extension is 
too long
[22-09-15 09:40:22.575]Error:[PDF]Error happen when converting pdf/toc.json to Pdf. Details: iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
   at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)

If I run without "noStdin": true then it works locally but I get the following error in my devops pipeline.

[22-09-15 09:52:38.607]Error:[PDF]Error happen when converting pdf/toc.json to Pdf. Details: System.AggregateException: One or more errors occurred. ---> iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetPartialPdfModels(IList`1 htmlFilePaths)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.ConvertOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.GetOutlines()
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
   at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
---> (Inner Exception #0) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #1) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #2) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
---> (Inner Exception #3) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #4) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #5) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #6) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>)<---

---> (Inner Exception #7) iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Convert[T](String arguments, Func`2 readerFunc)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.<>c__DisplayClass7_0.<GetPartialPdfModels>b__1(String htmlFilePath)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object <p0>

This error happens pretty early in the processing where the error with noStdIn happens very late.

Thanks for your help.

paulushub commented 2 years ago

@groogiam Thanks for the updates.

iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.

A search indicates two sources of the error:

nprorekhin commented 1 year ago

I had the similar issue with docfx 2.59.4.0 when running docxf from GitHub Actions.

I've added the following flags to docfx.json and the problem is gone:

groogiam commented 1 year ago

@nprorekhin Thanks for the additional information. I'm in the process of testing on azure dev ops but this configuration does not work when building on my local machine. It results in the The filename or extension is too long noted previously.

groogiam commented 1 year ago

Just finished testing on my Azure Dev Ops windows agent and adding the configuration

"wkhtmltopdf": { "additionalArguments": "-q --enable-local-file-access" }, "noStdin": true

results in this error

[22-12-16 01:12:36.781]Error:[PDF]Error happen when converting pdf/toc.json to Pdf. Details: iTextSharp.text.exceptions.InvalidPdfException: PDF header signature not found.
   at iTextSharp.text.pdf.PdfReader..ctor(ReaderProperties properties, IRandomAccessSource byteSource)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.SaveCore(Stream stream)
   at Microsoft.DocAsCode.HtmlToPdf.HtmlToPdfConverter.Save(String outputFileName)
   at Microsoft.DocAsCode.HtmlToPdf.ConvertWrapper.<>c__DisplayClass7_0.<ConvertCore>b__1(ManifestItem tocFile)
yufeih commented 1 year ago

Addressed in v2.73.0 with a new PDF engine.

groogiam commented 1 year ago

@yufeih There still seems to be issue with the new engine when running on CI. See below. Thanks.

api\toc.pdf: 98%
TimeoutException: Timeout 30000ms exceeded.
=========================== logs ===========================
navigating to 
"[http://127.0.0.1:55059/api/IconNames.html",](http://127.0.0.1:55059/api/IconNames.html%22,) waiting 
until "domcontentloaded"
============================================================
  at async Task<T> InnerSendMessageToServerAsync<T>(string guid, string method, 
     Dictionary<string, object> dictionary, bool keepNulls) in Connection.cs:214
  at async Task<T> WrapApiCallAsync<T>(Func<Task<T>> action, bool isInternal) in
     Connection.cs:521                                                          
  at async Task<IResponse> GotoAsync(string url, FrameGotoOptions options) in   
     Frame.cs:617                                                               
  at void MoveNext() in PdfBuilder.cs:150                                       
  at void MoveNext() in PdfBuilder.cs:178                                       
  at void MoveNext()                                                            
  at async Task CreatePdf(Func<Uri, Task<byte[]>> printPdf, ProgressTask task,  
     Uri outlineUrl, Outline outline, string outputPath,                        
     Action<Dictionary<Outline, int>> updatePageNumbers) in PdfBuilder.cs:169   
  at void MoveNext() in PdfBuilder.cs:89                                        
  at void MoveNext()                                                            
  at void MoveNext() in PdfBuilder.cs:82                                        
  at void MoveNext() in Progress.cs:98                                          
  at void MoveNext() in Progress.cs:133                                         
  at async Task<T> RunAsync<T>(Func<Task<T>> func) in DefaultExclusivityMode.cs:
     40                                                                         
  at async Task<T> StartAsync<T>(Func<ProgressContext, Task<T>> action) in      
     Progress.cs:116                                                            
  at async Task StartAsync(Func<ProgressContext, Task> action) in Progress.cs:96
  at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:80              
  at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:80              
  at async Task CreatePdf(string outputFolder) in PdfBuilder.cs:80              
  at void <Execute>b__0() in PdfCommand.cs:19                                   
  at int Run(LogOptions options, Action run) in CommandHelper.cs:43             
  at int Execute(CommandContext context, PdfCommandOptions options) in          
     PdfCommand.cs:14                                                           
  at Task<int> Execute(CommandContext context, CommandSettings settings) in     
     CommandOfT.cs:40                                                           
  at Task<int> Execute(CommandTree leaf, CommandTree tree, CommandContext       
     context, ITypeResolver resolver, IConfiguration configuration) in          
     CommandExecutor.cs:144                                                     
  at async Task<int> Execute(IConfiguration configuration, IEnumerable<string>  
     args) in CommandExecutor.cs:85                                             
  at async Task<int> RunAsync(IEnumerable<string> args) in CommandApp.cs:84
groogiam commented 1 year ago

@yufeih

The file that is failing is an api generated file with a very large amount of members. The generated html for the file is 32k lines. I can reproduce both in the Azure pipeline and by running manually on my CI server. It appears that the default timeout is not long enough to handle large files on older hardware. Is there a way to change this timeout? If not it seems like there should be or at least the timeout should be increased to provide support for older hardware.

yufeih commented 1 year ago

A timeout sounds reasonable.

groogiam commented 1 year ago

@yufeih Thanks for the quick turn around. Any idea when the .net tool with this change will be released. Thanks.