JuliaDocs / Documenter.jl

A documentation generator for Julia.
https://documenter.juliadocs.org
MIT License
807 stars 475 forks source link

ERROR: LoadError: PCRE compilation error: regular expression is too large #2489

Open AdamWysokinski opened 5 months ago

AdamWysokinski commented 5 months ago

Hi, I keep getting the following error:

[ Info: ExpandTemplates: expanding markdown templates.
ERROR: LoadError: PCRE compilation error: regular expression is too large at offset 35288

I was able to trace the line causing the issue:

NeuroAnalyzer.xcov(obj1::NeuroAnalyzer.NEURO, obj2::NeuroAnalyzer.NEURO; ch1::Union{Int64, Vector{Int64}, AbstractRange}=signal_channels(obj1), ch2::Union{Int64, Vector{Int64}, AbstractRange}=signal_channels(obj2), ep1::Union{Int64, Vector{Int64}, AbstractRange}=_c(nepochs(obj1)), ep2::Union{Int64, Vector{Int64}, AbstractRange}=_c(nepochs(obj2)), l::Real=1, demean::Bool=true, biased::Bool=true, method::Symbol=:sum)

I see nothing wrong with it. When I remove any two of the function arguments, it works fine and completes with no error.

Julia 1.10.2

mortenpi commented 5 months ago

Is there a stacktrace or an MWE you could put together? Not really sure which regex is blowing up, though minimally it looks like we should add some error handling somewhere.

goerz commented 5 months ago

Just for extra context: that string is inside a @docs block at https://github.com/JuliaHealth/NeuroAnalyzer.jl/blob/main/docs/src/index.md

AdamWysokinski commented 5 months ago

That's correct. It worked in the past, unfortunately I cannot trace when it went broken.

And here's the stacktrace:

[ Info: SetupBuildDirectory: setting up build directory.
[ Info: Doctest: running doctests.
[ Info: ExpandTemplates: expanding markdown templates.
ERROR: LoadError: PCRE compilation error: regular expression is too large at offset 35309
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compile(pattern::String, options::UInt32)
    @ Base.PCRE ./pcre.jl:165
  [3] compile(regex::Regex)
    @ Base ./regex.jl:80
  [4] Regex(pattern::String, compile_options::UInt32, match_options::UInt32)
    @ Base ./regex.jl:40
  [5] Regex
    @ ./regex.jl:68 [inlined]
  [6] find_block_in_file(code::String, file::String)
    @ Documenter.Utilities ~/.julia/packages/Documenter/bFHi4/src/Utilities/Utilities.jl:24
  [7] runner(::Type{Documenter.Expanders.DocsBlocks}, x::Markdown.Code, page::Documenter.Documents.Page, doc::Documenter.Documents.Document)
    @ Documenter.Expanders ~/.julia/packages/Documenter/bFHi4/src/Expanders.jl:277
  [8] dispatch(::Type{Documenter.Expanders.ExpanderPipeline}, ::Markdown.Code, ::Vararg{Any})
    @ Documenter.Utilities.Selectors ~/.julia/packages/Documenter/bFHi4/src/Utilities/Selectors.jl:170
  [9] expand(doc::Documenter.Documents.Document)
    @ Documenter.Expanders ~/.julia/packages/Documenter/bFHi4/src/Expanders.jl:42
 [10] runner(::Type{Documenter.Builder.ExpandTemplates}, doc::Documenter.Documents.Document)
    @ Documenter.Builder ~/.julia/packages/Documenter/bFHi4/src/Builder.jl:227
 [11] dispatch(::Type{Documenter.Builder.DocumentPipeline}, x::Documenter.Documents.Document)
    @ Documenter.Utilities.Selectors ~/.julia/packages/Documenter/bFHi4/src/Utilities/Selectors.jl:170
 [12] #2
    @ ~/.julia/packages/Documenter/bFHi4/src/Documenter.jl:249 [inlined]
 [13] cd(f::Documenter.var"#2#3"{Documenter.Documents.Document}, dir::String)
    @ Base.Filesystem ./file.jl:112
 [14] #makedocs#1
    @ ~/.julia/packages/Documenter/bFHi4/src/Documenter.jl:248 [inlined]
 [15] top-level scope
    @ ~/Documents/Code/NeuroAnalyzer.jl/docs/make_md.jl:30
in expression starting at /home/eb/Documents/Code/NeuroAnalyzer.jl/docs/make_md.jl:30
mortenpi commented 5 months ago

It looks like you have a huge ~doctest~ @docs block somewhere, which means that the logic we use to find its linenumbers breaks:

https://github.com/JuliaDocs/Documenter.jl/blob/4fe9cf1237293c530633bcf0ef183b48417f23d7/src/utilities/utilities.jl#L63-L64

We probably should switch away from using a regex for this.

Side note: it also looks like you're using an old Documenter version (0.27 branch I suspect).

mortenpi commented 5 months ago

Oh, yea, the at-docs blocks in https://github.com/JuliaHealth/NeuroAnalyzer.jl/blob/f2bba13cf8c41f76452c3fa0c5727f7eb1fe5191/docs/src/index.md?plain=1#L681 are really big. At least one of them is apparently more than 35KiB.

As a workaround, I think if you just split the biggest ones into multiple smaller one, it will fix the issue.

But also, just as a suggestion, you may want to consider using at-autodocs here, with a custom filter -- I suspect maintaining those lists by hand is not pleasant.

AdamWysokinski commented 5 months ago

I generate it automatically via bash script, e.g.

echo "\`\`\`@docs"
cat ../src/recorder/*.jl | grep ^function | sed s/"function "/"NeuroAnalyzer."/g
echo "\`\`\`"

I've tried using at-autodocs, but cannot setup Pages properly. How can I set it to point to all .jl files in src/recorder folder? (like in the example above)?

AdamWysokinski commented 5 months ago

The workaround you suggested helped, thanks!

goerz commented 5 months ago

Like I was hinting at on Slack: Maybe change that bash script to

echo "\`\`\`@docs"
cat ../src/recorder/*.jl | grep ^function | sed s/"function "/"NeuroAnalyzer."/g | sed s/"(.*)"//g | sort -u
echo "\`\`\`"

which strips the (extremely long) argument lists. That way, you get a docstring per function, not per method. Function docstrings automatically concatenate all method docstrings, with the drawback that you can't link to a specific method docstring anymore. I usually prefer the function docstrings over individual method docstrings, but your mileage may vary. It would definitely cut down the size of your @docs block dramatically.

Or, as suggested, use @autodocs, which keeps the individual method docstrings separate. For setting Pages correctly, it might help that the right-hand-side can be arbitrary Julia code. So as long as you can express the list of .jl files you want to include it in a one-liner, that should work. That feature of "arbitrary code" is mentioned in the manual for @index blocks, where it gives the example

```@index
Pages = map(file -> joinpath("man", file), readdir("man"))


That trick also applies to `@autodocs` and any similar Documenter-specific block.
AdamWysokinski commented 5 months ago

Thank you. The modified bash script works really good. I've tried @autodocs, but for some reason not all functions were rendered properly. I don't have time right now, but will investigate it later and submit an issue if necessary. Meanwhile, docstring per function is a perfect solution for my needs. Thanks again!