bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
993 stars 354 forks source link

Mutect VS FreeBayes #585

Closed ghost closed 10 years ago

ghost commented 10 years ago

I am trying to use bcbio to do somatic variants calling with either Mutect or FreeBayes, but I am not sure how performance they are between Mutect and FreeBayes. For example, how long Mutect and FreeBayes takes if I run one pair of samples (200GB in total) on a machine with 32 cores and 60GB RAM? Which one is more accurate? Is there anybody who can give me estimation cost? Thanks!

chapmanb commented 10 years ago

Michael; This comment has plots of accuracy for MuTect, FreeBayes, VarScan and Vardict on DREAM challenge data:

https://github.com/chapmanb/bcbio-nextgen/issues/428#issuecomment-54076553

We still have work to do to improve filtering for all of the callers, but that's the current status of accuracy.

We don't have detailed numbers on timing, although right now MuTect is slower due to indel calling. Since MuTect only calls SNPs we paired it with Scalpel for indel calling, and it can be quite slow in high depth regions. We're actively working on addressing this and enabling swapping in other indel callers like Pindel or excluding Indel calling entirely.

Hope this helps.

ghost commented 10 years ago

Hi Brad,

Thank you so much for your help!

I am running bcbio with Mutect to call somatic variants. However, I got an mutect version issue because MuTect 1.1.4 and lower is known to have incompatibilities with Java < 7. So I have to download mutect 1.1.5 and rerun the pipeline. But I don't want to redo the alignment, markduplicates, recalibrate and realign from fastq files. So I have two questions, 1) which bam files are should be used as the input files for mutect? I assume 1_2014-09-08_mutectdata-sort.bam and 2_2014-09-08_mutectdata-sort.bam are the sorted bam files as the input files for mutect and I should delete all tmp folds and rerun it. screen shot 2014-09-13 at 11 41 34 pm

2) whether I just need to change the yaml file to turn off alignment, marduplicats, recalibrate and realign? screen shot 2014-09-13 at 11 42 46 pm

For example, screen shot 2014-09-13 at 11 57 53 pm

...

Please correct me if I am wrong. Thanks!

chapmanb commented 10 years ago

Michael; bcbio comes with a 1.1.5 version of MuTect that is compatible with Java 7. Did you manually install a different version? I'm trying to understand if there is anything we should change about our default install or docs to make this clearer.

Regarding re-running, if you want to re-run from the same directory you previous ran, you could remove the old mutect directory and re-run directly.

If you want to use a BAM from another run as input, your new YAML file looks perfect. However, you want to use the final BAM file from the bamprep/samplename directory as input. The one in the align directory is aligned and de-duplicated, but the bamprep one has the realignment done.

Hope this helps.

ghost commented 10 years ago

Hi Brad,

Thanks a lot for your help!

Yes, I thought bcbio does not have mutect and manually installed a "new" mutect. Actually the mutect official website only provides a 1.1.4 version. I don't know how to upgrade it to 1.1.5. Do you have any suggestion to remove the 1.1.4 and upgrade to 1.1.5?

I already delete the "bamprep/samplename" directory and only keep the "align" directory, so I should realign it again (realign: gatk) from the bam files in the "align" directory, right?

Again, I really appreciate your help!

ghost commented 10 years ago

Hi Brad,

I am reinstalling bcbio and then it will automatically upgrade to mutect 1.1.5. However I kept having this error message,

==> Downloading https://github.com/GregoryFaust/samblaster/releases/download/v.0.1.16/samblaster-v.0.1.16.tar.gz Already downloaded: /root/.cache/Homebrew/samblaster-0.1.16.tar.gz ==> make /bcbio/Cellar/samblaster/0.1.16: 4 files, 68K, built in 2 seconds [localhost] local: /bcbio/bin/brew link --overwrite samblaster Warning: Already linked: /bcbio/Cellar/samblaster/0.1.16 To relink: brew unlink samblaster && brew link samblaster [localhost] local: /bcbio/bin/brew install --HEAD seqtk ==> Cloning https://github.com/lh3/seqtk.git Updating /root/.cache/Homebrew/seqtk--git ==> make Error: No such file or directory - tabtk

Fatal error: local() encountered an error (return code 1) while executing '/bcbio/bin/brew install --HEAD seqtk'

Aborting. Traceback (most recent call last): File "bcbio_nextgen_install.py", line 255, in main(parser.parse_args(), sys.argv[1:]) File "bcbio_nextgen_install.py", line 41, in main subprocess.check_call([bcbio["bcbio_nextgen.py"], "upgrade"] + _clean_args(sys_argv, args, bcbio)) File "/usr/lib/python2.7/subprocess.py", line 511, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/bcbio/anaconda/bin/bcbio_nextgen.py', 'upgrade', '--tooldir=/bcbio', '--isolate', '--genomes', 'GRCh37', '--aligners', 'bwa', '--aligners', 'bowtie2', '--data']' returned non-zero exit status 1

I am not sure if I miss something not installed. Please take a look what's going on.

Thanks, Michael

chapmanb commented 10 years ago

Michael -- closing this since we got the tabtk problem sorted out in #592

ghost commented 10 years ago

Hi Brad,

It has been fixed and thanks a lot!

Michael