PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Regarding installation and running GRIDSS #91

Closed amrita1983 closed 7 years ago

amrita1983 commented 7 years ago

Hi,

I am trying to do some SV detection and breakpoints mapping, but not being able to get it done for long, came across to your tool. Can please tell me how can I install it in Linux and what are the input files, is it already a BWA meme generated bam file? and bed file is mentioned what bed file I can use not clear on that.

Please note that I have Nanopore (1D)data to do the analysis. Please reply.

lachlansimpson commented 7 years ago

You don't need to install it - you can just download the gridss-1.4.3-jar-with-dependencies.jar from this page and then use the Java installed. I think it needs to be Java 1.8 at this point, but that's the default in most Linux OSes.

java -jar gridss-1.4.3-jar-with-dependencies.jar --other-options

See this shell script for an example: https://github.com/PapenfussLab/gridss/blob/master/example/gridss.sh

amrita1983 commented 7 years ago

Thank you for the information. But what is about that bed file you ahve mentioned I donot have any targeted bed file for the dataset of mine, as I want to identify the SV and breakpoints from them. Do I have to run INPUT=chr12.1527326.DEL1024.bam BLACKLIST=wgEncodeDacMapabilityConsensusExcludable.bed REFERENCE=~/reference_genomes/human/hg19.fa OUTPUT=${INPUT/.bam/.sv.vcf} ASSEMBLY=${OUTPUT/.sv.vcf/.gridss.assembly.bam} GRIDSS_JAR=~/bin/gridss-1.4.1-jar-with-dependencies.jar

this section inly using a .sh file?

d-cameron commented 7 years ago

Please note that I have Nanopore (1D)data to do the analysis.

GRIDSS is designed for sequencing data in which the predominant mode of sequencing error is base substitution (such as Ilumina short read sequencing data). The positional de Bruijn graph assembly performed by GRIDSS does not perform well with sequencing data in which the majority of errors are indel errors (e.g. Nanapore, PacBio). Whilst I have plans to write a new OLC based assembler so GRIDSS works well with long read sequencing data, this work has not yet been done.

I expect GRIDSS will outperform other SV callers designed for short reads (due to the compound split read remapping performed by GRIDS), but an SV caller designed for nanopore data is likely to outperform GRIDSS on your data set.

Can please tell me how can I install it in Linux GRIDSS requires bwa, java 1.8, and R. This documentation can be found at: https://github.com/PapenfussLab/gridss#pre-requisities

GRIDSS itself can be downloaded by following the link in the documentation at https://github.com/PapenfussLab/gridss#running or clicking the "Releases" tabs on this github repository.

what are the input files, is it already a BWA meme generated bam file?

GRIDSS requires the aligned BAM file and the reference genome used in the alignment. A minimum GRIDSS command-line invokation would look like:

java -Xmx31g -cp gridss-1.4.3-jar-with-dependencies.jar gridss.CallVariants REFERENCE_SEQUENCE=mygenome.fa INPUT=mydata.bam ASSEMBLY=assembly.bam OUTPUT=gridss.vcf

For nanopore data, I recommend changing the bwa mem split read realignment command-line invoked by GRIDSS to use the long read alignment parameter of bwa mem.

bed file is mentioned what bed file I can use not clear on that.

The bed BLACKLIST file is entirely optional and not required to run GRIDSS.