glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
81 stars 26 forks source link

Best parameters to detect insertions and deletions #29

Open HussainAther opened 9 years ago

HussainAther commented 9 years ago

I'm using the progressive Cactus alignment tool to measure indels between different Drosophila species. When converting the hal file to a maf file, I run a command similar to this:

source ./environment && hal2mafMP.py ./folder/b00.hal: ./folder/b00.maf --noDupes --maxRefGap 8000

However, the output results may not be detecting all true insertions and deletions. What parameters for the hal2maf.py command would you recommend in order to approach this problem?

glennhickey commented 9 years ago

could swear I replied to this before but I guess gmail ate it...

Anyway, a few things:

Try to avoid the --noDupes option. It probably doesn't do what you want. I've been meaning to make this more clear in the documentation.

hal2maf is limited in that it only writes alignment columns that can be given coordinates in the reference genome (the root ancestor by default). If base x of genome A aligns to base y of genome B but neither aligns to the root, nor can either be placed in a deletion in the root, then this column won't be in the MAF.

so if you are interested in looking at mutations with resepect to a given reference genome, add the --refGenome flag to specify that genome.

if you want to count all mutations on all branches, you can either make a MAF for each branch, or run halSummarizeMutations to print a table (recommend --maxRefGap 0 option).

If you want coordinates of all mutations, you can use halBranchMutations on each branch, or the script halTreeMutations.py to run it all at once (again with --maxRefGap 0)

On Fri, Dec 12, 2014 at 10:47 AM, Syed Hussain Ather < notifications@github.com> wrote:

I'm using the progressive Cactus alignment tool to measure indels between different Drosophila species. When converting the hal file to a maf file, I run a command similar to this:

source ./environment && hal2mafMP.py ./folder/b00.hal: ./folder/b00.maf --noDupes --maxRefGap 8000

However, the output results may not be detecting all true insertions and deletions. What parameters for the hal2maf.py command would you recommend in order to approach this problem?

— Reply to this email directly or view it on GitHub https://github.com/glennhickey/progressiveCactus/issues/29.

HussainAther commented 9 years ago

Thank you for the response!

We are aligning Drosophila melanogaster as a reference to Drosophila simulans, Drosophila erecta, and Drosophila yakuba. We want to polarize melanogaster and simulans with respect to erecta and yakuba.

Is there a way to polarize these two species from the other two?

HussainAther commented 9 years ago

I ran this command:

halBranchMutations b00.hal: dmel --refFile ins_test.bed --parentFile del_test.bed

to get the ins_test.bed file with the insertions and the del_test.bed file with the deletions. However, the numbers from the del_test.bed file does not match up with the numbers from the halSummarizeMutations method. Is this supposed to happen? The insertions from ins_test.bed file does match up.

HussainAther commented 9 years ago

In addition, in the alignathon paper (http://genome.cshlp.org/content/early/2014/10/01/gr.174920.114.full.pdf), what parameters and methods of counting insertions and deletions did you use to participate in that competition?

glennhickey commented 9 years ago

Hi Syed,

For the Alignathon, the official submission format was MAF, and that's what was used for the analyses throughout the project.

http://compbio.soe.ucsc.edu/alignathon/details.html

In the case of cactus, we submitted an alignment generated from hal2maf.

cheers -Glenn

On Tue, Jan 13, 2015 at 12:05 PM, Syed Hussain Ather < notifications@github.com> wrote:

In addition, in the alignathon paper ( http://genome.cshlp.org/content/early/2014/10/01/gr.174920.114.full.pdf), what parameters and methods of counting insertions and deletions did you use to participate in that competition?

— Reply to this email directly or view it on GitHub https://github.com/glennhickey/progressiveCactus/issues/29#issuecomment-69779309 .