Closed HenrikBengtsson closed 2 years ago
I found the below. I guess you don't need all those GATK versions(?)
[henrik@cclc01 ~]$ ls -l /home/jocostello/shared/LG3_Pipeline_HIDE/tools
total 471876
drwxr-xr-x 3 jocostello songlab 4096 Feb 7 2012 bwa-0.5.10
drwxr-xr-x 8 jocostello costellolab 4096 Dec 16 11:07 FastQC.v0.11.9
drwxr-xr-x 4 jocostello costellolab 310 Dec 17 2018 gatk-4.0.12.0
drwxr-xr-x 4 jocostello costellolab 308 Jan 29 2019 gatk-4.1.0.0
drwxr-xr-x 5 jocostello costellolab 321 Apr 11 2019 gatk-4.1.1.0
drwxr-xr-x 4 jocostello costellolab 308 Apr 23 2019 gatk-4.1.2.0
drwxr-xr-x 4 jocostello costellolab 308 Nov 11 2019 gatk-4.1.4.0
drwxr-xr-x 4 jocostello costellolab 308 Nov 27 2019 gatk-4.1.4.1
drwxr-xr-x 4 jocostello costellolab 308 Mar 1 2020 gatk-4.1.5.0
drwxr-xr-x 4 jocostello costellolab 308 Mar 25 2020 gatk-4.1.6.0
drwxr-xr-x 4 jocostello costellolab 321 May 26 2020 gatk-4.1.7.0
drwxr-xr-x 4 jocostello costellolab 308 Jul 20 2020 gatk-4.1.8.1
drwxr-xr-x 4 jocostello costellolab 308 Nov 7 2020 gatk-4.1.9.0
-rw-r--r-- 1 jocostello costellolab 454612009 Oct 9 2020 gatk-4.1.9.0.zip
drwxr-xr-x 3 jocostello songlab 96 Mar 19 2012 GenomeAnalysisTK-1.5-12-gd0056d6
drwxr-xr-x 3 jocostello songlab 96 May 16 2012 GenomeAnalysisTK-1.6-5-g557da77
drwxr-xr-x 3 jocostello songlab 33 Sep 22 2011 java
-rwxrwxr-x 1 jocostello costellolab 83 Nov 8 2012 LICENSE.TXT
-rw-r--r-- 1 jocostello costellolab 7833 May 26 2020 muTect-1.0.27783.help
-rwxr-xr-x 1 jocostello songlab 8322312 May 17 2012 muTect-1.0.27783.jar
-rw-r--r-- 1 jocostello costellolab 9684017 Feb 8 2013 muTect-1.1.4-bin.zip
-rw-rw-r-- 1 jocostello costellolab 10438338 Nov 8 2012 muTect-1.1.4.jar
drwxr-xr-x 12 jocostello costellolab 4096 May 21 18:54 picard
drwxr-xr-x 2 jocostello songlab 4096 Mar 12 2012 picard-tools-1.64
drwxr-xr-x 3 jocostello costellolab 234 Feb 12 17:37 pindel024t
drwxr-xr-x 6 jocostello costellolab 4096 May 29 2012 samtools-0.1.12a
drwxr-xr-x 6 jocostello songlab 4096 Mar 15 2012 samtools-0.1.18
-rwxr-x--- 1 jocostello costellolab 80522 Sep 28 2020 snp-pileup
drwxr-xr-x 4 henrik costellolab 158 Sep 17 2018 TrimGalore-0.4.4
drwxr-xr-x 4 jocostello costellolab 158 Sep 4 2020 TrimGalore-0.6.6
-rw-rw-r-- 1 jocostello costellolab 54 Nov 8 2012 version.txt
I deleted older versions of GATK4, thanks!
Got it.
My notes: TrimGalore requires Cutadapt, which apparently was installed centrally on TIPCC:
[henrik@cclc01 ~]$ which cutadapt
/opt/Python/Python-2.7.9/bin/cutadapt
[henrik@cclc01 ~]$ cutadapt --version
1.8.1
So, that's the version that needs to be installed on C4 for full backward compatibility. I've now added CBI module cutadapt/1.8.1
in addition to cutadapt/3.4
on C4.
All but the following software versions are now available as CBI modules on C4:
drwxr-xr-x 3 jocostello songlab 96 Mar 19 2012 GenomeAnalysisTK-1.5-12-gd0056d6
drwxr-xr-x 3 jocostello songlab 96 May 16 2012 GenomeAnalysisTK-1.6-5-g557da77
-rwxr-xr-x 1 jocostello songlab 8322312 May 17 2012 muTect-1.0.27783.jar
-rw-rw-r-- 1 jocostello costellolab 10438338 Nov 8 2012 muTect-1.1.4.jar
drwxr-xr-x 3 jocostello costellolab 234 Feb 12 17:37 pindel024t
drwxr-xr-x 6 jocostello costellolab 4096 May 29 2012 samtools-0.1.12a
drwxr-xr-x 6 jocostello songlab 4096 Mar 15 2012 samtools-0.1.18
Managed to get the legacy versions of samtools installed on C4;
$ module avail samtools
----------------------------------- /software/c4/cbi/modulefiles ------------------------------------
samtools/0.1.12a (L) samtools/1.10 samtools/1.12
samtools/0.1.18 samtools/1.11 samtools/1.13 (D)
Remaining software is now:
drwxr-xr-x 3 jocostello songlab 96 Mar 19 2012 GenomeAnalysisTK-1.5-12-gd0056d6
drwxr-xr-x 3 jocostello songlab 96 May 16 2012 GenomeAnalysisTK-1.6-5-g557da77
-rwxr-xr-x 1 jocostello songlab 8322312 May 17 2012 muTect-1.0.27783.jar
-rw-rw-r-- 1 jocostello costellolab 10438338 Nov 8 2012 muTect-1.1.4.jar
drwxr-xr-x 3 jocostello costellolab 234 Feb 12 17:37 pindel024t
I've managed to install muTect 1.1.1 and 1.1.4, cf. module load CBI; module avail mutect
. Still can't find an official source for 1.0.27783 though. Remaining software is now:
drwxr-xr-x 3 jocostello songlab 96 Mar 19 2012 GenomeAnalysisTK-1.5-12-gd0056d6
drwxr-xr-x 3 jocostello songlab 96 May 16 2012 GenomeAnalysisTK-1.6-5-g557da77
-rwxr-xr-x 1 jocostello songlab 8322312 May 17 2012 muTect-1.0.27783.jar
drwxr-xr-x 3 jocostello costellolab 234 Feb 12 17:37 pindel024t
Woohoo, through some forensic internet searching using https://web.archive.org/, I managed to track down a Broad FTP server (ftp://ftp.broadinstitute.org/pub/gsa/GenomeAnalysisTK/) that hosts all legacy versions of GATK (1.0-2.3.9), include above two versions:
$ curl ftp://ftp.broadinstitute.org/pub/gsa/GenomeAnalysisTK/ | grep -E "GenomeAnalysisTK-(1.5.12|1.6-5)"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14480 0 14480 0 0 10379 0 --:--:-- 0:00:01 --:--:-- 10372
-rw-r--r-- 1 gsa-engineering wga 18104172 Mar 19 2012 GenomeAnalysisTK-1.5-12-gd0056d6.tar.bz2
-rw-r--r-- 1 gsa-engineering wga 18502494 May 3 2012 GenomeAnalysisTK-1.6-5-g557da77.tar.bz2
100 27864 0 27864 0 0 18011 0 --:--:-- 0:00:01 --:--:-- 18000
After hours and hours, I finally managed to install pindel 0.2.4t on both TIPCC and C4 under the CBI software stack, i.e. module load CBI pindel/0.2.4t
.
This leaves us with:
drwxr-xr-x 3 jocostello songlab 96 Mar 19 2012 GenomeAnalysisTK-1.5-12-gd0056d6
drwxr-xr-x 3 jocostello songlab 96 May 16 2012 GenomeAnalysisTK-1.6-5-g557da77
-rwxr-xr-x 1 jocostello songlab 8322312 May 17 2012 muTect-1.0.27783.jar
And, I've managed to install GATK 1.6.5 as a module on both TIPCC and C4, i.e. module load CBI gatk/1.6-5-g557da77
.
Turns out we're not using GATK 1.5-12-gd0056d6 anywhere, so that leaves with only:
-rwxr-xr-x 1 jocostello songlab 8322312 May 17 2012 muTect-1.0.27783.jar
We need ANNOVAR too and it's complicated. It requires online registration to access/download, and you only get the latest version. Argh. So much for reproducible science.
I think it's the version we are using is referred to as AnnoVar 2011-10-02;
]$ ${LG3_HOME}/tools/AnnoVar/annotate_variation.pl --help | grep Version
Version: $LastChangedDate: 2011-10-02 22:13:18 -0700 (Sun, 02 Oct 2011) $
Since muTect is plain Java and ANNOVAR is plain Perl, we might be able to just copy them over from TIPCC to C4 as-is; not a pretty solution but that might be the only solution.
Posted Download muTect-1.0.27783.jar? to the GATK forum.
Ah, so it turns out that our existing muTect-1.0.27783.jar
on TIPCC presents itself as GATK 1.1-37-g5cedb2d;
[henrik@cclc01 ~/repositories/UCSF-CostelloLab/test-next-release]$ /home/jocostello/shared/LG3_Pipeline_HIDE/tools/muTect-1.0.27783.jar --help | head -6
---------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v1.1-37-g5cedb2d, Compiled 2011/09/14 10:01:32
Copyright (c) 2010 The Broad Institute
Please view our documentation at http://www.broadinstitute.org/gsa/wiki
For support, please view our support site at http://getsatisfaction.com/gsa
---------------------------------------------------------------------------------
That's interesting. So, I went to install GATK 1.1-37 from ftp://ftp.broadinstitute.org/pub/gsa/GenomeAnalysisTK;
[henrik@cclc01 ~]$ module load gatk/1.1-37-ge63d9d8
[henrik@cclc01 ~]$ java -jar ${GATK_HOME}/GenomeAnalysisTK.jar --help | head -6
---------------------------------------------------------------------------------
The Genome Analysis Toolkit (GATK) v1.1-37-ge63d9d8, Compiled 2011/09/13 01:15:42
Copyright (c) 2010 The Broad Institute
Please view our documentation at http://www.broadinstitute.org/gsa/wiki
For support, please view our support site at http://getsatisfaction.com/gsa
---------------------------------------------------------------------------------
It turns out to have a compile date (2011-09-13 rather than 2011-09-14) and a different hash code (ge63d9d8 rather than g5cedb2d), so certainly not identical, but hopefully good enough for our migration needs.
I've installed this on both TIPCC and C4.
I've created annovar-2011-10-02.tar.gz
from TIPCC:/home/jocostello/shared/LG3_Pipeline_HIDE/AnnoVar/
and installed it as modules on TIPCC and C4. I've also installed /home/jocostello/shared/LG3_Pipeline_HIDE/Annovar_2015Jun17/annovar.latest.tar.gz
, and the latest official version (which is the only one you can download after registration yadayadayada). So, now we have:
$ module avail annovar
------------------------------------------------- /home/shared/cbc/apps/modulefiles/CBC --------------------------------------------------
annovar/2011-10-02 annovar/2015-06-17 annovar/2020-06-07 (L,D)
This was a hack, but I think that completes our needs for software tools needed by the pipeline.
I'll next try to run through the pipeline using the software tools available from the CBI module stack. If all works well, we should be able to scratch most of ${LG3_HOME}/tools/
. Closing this issue.
Argh... I might have been too quick about muTect-1.0.27783.jar
(https://github.com/UCSF-Costello-Lab/LG3_Pipeline/issues/146#issuecomment-938183908). Although it presents itself as GATK, it's not GATK :(
Copied muTect-1.0.27783.jar
from TIPCC:/home/jocostello/shared/LG3_Pipeline_HIDE/tools
and installed as module load mutect/1.0.27783
on TIPCC and C4. Good enough for now; hopefully the Broad/GATK folks will tell us from where we can get the official version.
@ivan108, regarding migrating this pipeline to C4, could list the software tools and the versions you're using right now here? Then I'll start installing them as CBI software modules.