Bioconductor / BiocStyle

Issues and pull requests for BiocStyle should go here.
12 stars 19 forks source link

C stack usage too close to the limit - vignette knitting #88

Closed federicomarini closed 2 years ago

federicomarini commented 3 years ago

Hi, I am having a couple of packages failing in the build - see e.g. https://master.bioconductor.org/checkResults/3.13/bioc-LATEST/flowcatchR/nebbiolo1-buildsrc.html The issue seems to happen in Linux and MacOS machines, and I have digged into it a little already with @hpages on the #bioc-builds slack channel.

It looks like it is a combination of pandoc and plotly (thanks Hervé!), and I could further see that there might be a co-participation of BiocStyle::html_document.

A minimal example that fails: the flowcatchR vignette on master - while knitting, not when run interactively -> https://github.com/federicomarini/flowcatchR/blob/master/vignettes/flowcatchr_vignette.Rmd A minimal example that does not fail: that same vignette, no changes in the code itself, but only selecting a different output format in the yaml header of the vignette -> https://github.com/federicomarini/flowcatchR/blob/1f3800d6b8f47c68e293b0cbe10fd5b6b0ec6f66/vignettes/flowcatchr_vignette.Rmd#L31

This also affects some ~20-ish packages, and might also be due to recent changes in plotly (4.9.4 since last week on CRAN).

This "workaround" by changing the format of the vignette is not so ideal, as I do also use some features of the figure captioning tweaks.

@hpages - shall we try to gather the attention also from the maintainers of the other packages that are failing?

I can also refer to this in an issue to open in plotly's GH repo.

Thanks for the assistance! Federico

grimbough commented 3 years ago

When did this start failing? Any idea if it happen on devel a week or so before release? I pushed a small change to BiocStyle recently, which went to devel first.

I'll see if I can recreate the issue locally, and then try reverting to the older BiocStyle to be sure.

federicomarini commented 3 years ago

The email from the BBS came on June 14, so maybe give or take some days before that?

grimbough commented 3 years ago

Investigating with @federicomarini, it looks like this was introduced in 42d9cd89d89baf43144b9677acee4ca00ec488aa

Hypothesis is that including plotly creates a file so large that that the regex pattern matching inserted in my change somehow falls over.

hpages commented 3 years ago

I just successfully ran R CMD build on pcaExplorer, flowcatchR, and GRmetrics with your fix @grimbough. Thanks!

grimbough commented 3 years ago

Great, thanks for letting me know. That's already in devel on git.bioconductor & I'll patch and push release when I'm back at a computer.

hpages commented 3 years ago

BTW fixed=TRUE is also much more efficient than fixed=FALSE when the splitting pattern is not a regex. It's not only faster:

> bigstring <- strrep(paste(readLines("BiocStyle/DESCRIPTION"), collapse="\n"), 1.5e6)
> system.time(strsplit(bigstring, split="\n", fixed=TRUE))
   user  system elapsed 
  9.088   0.071   9.163 
> system.time(strsplit(bigstring, split="\n"))
   user  system elapsed 
 70.196   7.624  83.241 

but top also reports that the former uses 1.8 Gb of RAM on my laptop vs 13 Gb for the latter.

I didn't know that the latter could also produce a "C stack usage too close to the limit" error which is one more reason to stay away from regexps when you can. This suggests that the current C implementation relies heavily on recursivity which is kind of surprising because doing so is generally not optimal nor does it tend to scale well.

federicomarini commented 3 years ago

Did some more reading & searching, TIL it is reaaaally easy to trigger that error

change_to_factor <- function(x){
  x <- change_to_character(x)
  as.factor(x)
} 

change_to_character <- function(x){
  x <- change_to_factor(x)
  as.character(x)
}

change_to_character("1")

Source: https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit

Back on topic of the issue: Thanks a lot @grimbough for the quick fix!

hpages commented 3 years ago

Even simpler:

> foo <- function() foo()
> foo()
Error: C stack usage  7977060 is too close to the limit

Any situation of infinite recursion will trigger it. But sometimes, like in this issue, it happens when the recursion goes too deep, even though it's not a situation of infinite recursion.