adamewing / GRIPper

Find non-reference processed pseudogene insertions from discordant read pair mappings
MIT License
5 stars 4 forks source link

bwa index #2

Closed proukakis closed 6 years ago

proukakis commented 7 years ago

Hello I guess something is different in my BWA version so I get

bwa index -a stdsw /Volumes/Toshiba4Tb/GenomeReferences/hg19.fasta [bwa_index] unknown algorithm: 'studs'.

testing shows

Usage: bwa index [options]

Options: -a STR BWT construction algorithm: bwtsw, is or rb2 [auto] -p STR prefix of the index [same as fasta name] -b INT block size for the bwtsw algorithm (effective with -a bwtsw) [10000000] -6 index files named as .64. instead of .

Warning: -a bwtsw' does not work for short genomes, while-a is' and `-a div' do not work not for long genomes.

do you think bwtsw will work? Sorry if it's a stupid question, but this is a bit above my level!

thanks christos

adamewing commented 7 years ago

Hi Christos,

bwa index is used for indexing BAM files. To index a .fasta file, use bwa faidx:

bwa faidx /Volumes/Toshiba4Tb/GenomeReferences/hg19.fasta

Hope that helps.

PS, I currently recommend TEBreak for finding retrogene insertions: https://github.com/adamewing/tebreak

--Adam

proukakis commented 7 years ago

omg of course! thanks...

Yes I will look at TEBreak, I have some mate pair sequence from two brain regions, although the "physical" coverage is only 2x but bridged coverage is ~50x.

proukakis commented 7 years ago

BTW any idea if TEBReak will work on a mac (El Capitan)

adamewing commented 7 years ago

You might struggle with structural variants at that depth - unfortunately physical coverage matters more than the coverage spanned by paired ends because the mapping information used to localise breakpoints is in the sequenced bits.

I don't have a mac on hand but the challenge on any system will be in getting the prerequisite software packages installed ... if you can get those, TEBreak should work.