broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

VariantRecalibrator R-script fails if `scales` v1.3.0 is installed #8664

Open MikkelSchubert opened 9 months ago

MikkelSchubert commented 9 months ago

Bug Report

Affected tool(s) or class(es)

VariantRecalibrator

Affected version(s)

Description

As of v1.3.0 the scales R package turns the use of deprecated values for the space parameter into a hard error, resulting in the VariantRecalibrator R-script terminating with the following message:

The space argument of pal_gradient_n() only supports be "Lab" as of scales 0.3.0.

This parameter is used repeatedly in the generated R-script via

scale_fill_gradient(high="green", low="red", space="rgb")

Steps to reproduce

$ R --version
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
$ rm -rf ~/R
$ R
> install.packages("ggplot2", repos="https://cloud.r-project.org/")
> packageVersion("scales")
[1] ‘1.3.0’
> quit()
$ gatk --version
The Genome Analysis Toolkit (GATK) v4.5.0.0
HTSJDK Version: 4.1.0
Picard Version: 3.1.1
$ gatk VariantRecalibrator  [arguments omitted for brevity]
org.broadinstitute.hellbender.utils.R.RScriptExecutorException: 
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.9339186078473502558';source('/path/to/rscript.r');
Stdout: 
Stderr: Error:
! The `space` argument of `pal_gradient_n()` only supports be "Lab" as
  of scales 0.3.0.
Backtrace:
     ▆
  1. ├─base::source("/path/to/rscript.r")
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │   └─base::eval(ei, envir)
  5. └─ggplot2::scale_fill_gradient(high = "green", low = "red", space = "rgb")
  6.   ├─ggplot2::continuous_scale(...)
  7.   │ └─ggplot2::ggproto(...)
  8.   │   └─rlang::list2(...)
  9.   └─scales::seq_gradient_pal(low, high, space)
 10.     └─scales::pal_gradient_n(c(low, high), space = space)
 11.       └─lifecycle::deprecate_stop("0.3.0", "pal_gradient_n(space = 'only supports be \"Lab\"')")
 12.         └─lifecycle:::deprecate_stop0(msg)
 13.           └─rlang::cnd_signal(...)
Execution halted
$ R
> install.packages("remotes", repos="https://cloud.r-project.org/")
> library(remotes)
> install_version("scales", version="1.2.1", repos="https://cloud.r-project.org/")
> packageVersion("scales")
[1] ‘1.2.1’
> quit()
$ gatk VariantRecalibrator [arguments omitted for brevity]
$

Expected behavior

The output rscript file is used to generate a PDF.

Actual behavior

Generation of the PDF fails due to an deprecation in the scales library causing the Rscript command to abort.

lbergelson commented 8 months ago

Looks like we need to update our Rscripts... thanks for the report!

wanqiangdehuoguo commented 6 months ago

Hi, i face the same bug.

Could you tell me which version ggplot2 can be used or how many times you can fix this problem?

tahanks!

gokalpcelik commented 6 months ago

R version 3.6 and compatible ggplot2 is needed. Compatible versions are listed in the gatkcondaenv.yml

# core R dependencies; these should only be used for plotting and do not take precedence over core python dependencies!
- r-base=3.6.2
- r-data.table=1.12.8
- r-dplyr=0.8.5
- r-getopt=1.20.3
- r-ggplot2=3.3.0
- r-gplots=3.0.3
- r-gsalib=2.1
- r-optparse=1.6.4
- r-backports=1.1.10
Lotteaveline commented 6 months ago

Hi, I also face this problem:

Runtime.totalMemory()=8598323200`
org.broadinstitute.hellbender.utils.R.RScriptExecutorException: 
Rscript exited with 1
Command Line: Rscript -e tempLibDir = '/tmp/Rlib.3561179774649616878';source('/mnt/filename.snps.plots.R');
Stdout: 
Stderr: Error:
! The `space` argument of `pal_gradient_n()` only supports be "Lab" as
  of scales 0.3.0.
Backtrace:
     ▆
  1. ├─base::source("/mnt/filename.snps.plots.R")
  2. │ ├─base::withVisible(eval(ei, envir))
  3. │ └─base::eval(ei, envir)
  4. │   └─base::eval(ei, envir)
  5. └─ggplot2::scale_fill_gradient(high = "green", low = "red", space = "rgb")
  6.   ├─ggplot2::continuous_scale(...)
  7.   │ └─ggplot2::ggproto(...)
  8.   │   └─rlang::list2(...)
  9.   └─scales::pal_seq_gradient(low, high, space)
 10.     └─scales::pal_gradient_n(c(low, high), space = space)
 11.       └─lifecycle::deprecate_stop("0.3.0", "pal_gradient_n(space = 'only supports be \"Lab\"')")
 12.         └─lifecycle:::deprecate_stop0(msg)
 13.           └─rlang::cnd_signal(...)
Execution halted

My versions of R and packages are R = 4.2.3 ggplot2 = 3.5.0

Did you already find a solution to this problem?

Thanks!

gokalpcelik commented 6 months ago

Hi @Lotteaveline Recent versions of R and libraries are known to have issues therefore our suggestion is to stick with the versions recommended in the list above.

Lotteaveline commented 6 months ago

Okay thank you for the quick response!

jielab commented 3 months ago

Hi, there:

I am using R v4.3.4, scales v1.3.0, ggplot2 v3.4.4.

Can you please kindly let me know how to resolve the issue mentioend above: The space argument of pal_gradient_n() only supports be "Lab" as of scales 0.3.0.

I hope that I don't need to download my R version, that will make a lot of other scripts not work.

Thanks! JH

gokalpcelik commented 3 months ago

You need to use the versions suggested above. If it is not possible to downgrade your R environment then the only solution would be to use the Conda environment for GATK which installs all the necessary components. Or you may use the docker image we provide.

jielab commented 3 months ago

Thanks!

GATK has been there fore more than 1 decade, I guess. I really hope that now it is easy to run.

Can you please let me know how to install through conda then?

BTW, the current version 4.5.0 does not require users to separate SNP from INDEL when calling variants, correct?

Best regards, Jie

gokalpcelik commented 3 months ago

Just follow the recommendations from our readme file


First, make sure [Miniconda or Conda](https://conda.io/docs/index.html) is installed (Miniconda is sufficient).

To "create" the conda environment:
If running from a zip or tar distribution, run the command conda env create -f gatkcondaenv.yml to create the gatk environment.

Execute the shell command source activate gatk to activate the gatk environment.
See the [Conda](https://conda.io/docs/user-guide/tasks/manage-environments.html) documentation for additional information about using and managing Conda environments.

And yes you don't have to call SNPs and INDELs separately.

jielab commented 3 months ago

Dear Gökalp:

Thank you very much!

You suggested to run conda env create -f gatkcondaenv.yml. Where is the gatkcondaenv.yml file?

If I simply used git clone https://github.com/broadinstitute/gatk.git. The cloned package has a gatk executable. I found that I could run it directly.

If I simply go to https://gatk.broadinstitute.org/hc/en-us homepage, and download the latest version file https://github.com/broadinstitute/gatk/releases/download/4.6.0.0/gatk-4.6.0.0.zip. After unzipping it, there is also a gatk executable, and I could also run it directly (./gatk) on the shell.

So, now I am a bit puzzled: which is the recommended way to install and run GATK?

Finally, it seems that you guys now recommend WARP https://broadinstitute.github.io/warp/, which seems to be a completely new set of tools and pipeline scripts. Is WDL now the recommended approach to run GATK?

Thank you very much & best regards, Jie