broadinstitute / viral-ngs

Viral genomics analysis pipelines
Other
189 stars 67 forks source link

VPhaser 2.0 support #568

Open chukrello opened 7 years ago

chukrello commented 7 years ago

Hello!

I just would like to know if technical support of Vphaser still works? If yes, where can I find it?

Sincerely, Konstantin.

dpark01 commented 7 years ago

Hi @chukrello, vphaser2 still has some very limited support from our team -- it is a part of the current viral-ngs suite and we have a pretty good understanding of how we like to use it, but at this point, we have limited expertise with the internal mechanics of it when problems arise.

Do you have a specific question about it?

chukrello commented 7 years ago

Dear @dpark01, thank you very much for your answer!

My question is not quite specific, but I didn't find any similar problem on the Vphaser Google Group.

I downloaded a set of sequences from the article of one author (about 3500). After using samtools mpileup I got 3500 .bam files and ran Vphaser on each .bam. The problem is that some of the runs were successful, but others (and it's rather big number of runs) have same mistake:

[EXIT]: prep_aln_file: SC failed

Knowing that all this .bam files were generated from one source, it looks quite strange. So I wonder is it Vphaser problem or not.

It would be great if you know anything about this issue.

Kind regards, Konstantin.

tomkinsc commented 7 years ago

It seems the relevant section of the V-Phaser-2 source is here.

A few questions:

chukrello commented 7 years ago

Dear @tomkinsc,

Thank you for your reply!

1) I run vphaser in isolation 2) Actually I don't clearly understand what do you mean by "pass particular parameters" 3) I looked into the pipeline and found out that .bam was made with "samtools view -bS", and .sam file was made with "smalt map".

Hope it helps.

tomkinsc commented 7 years ago

Hi @chukrello,

  1. I was just curious which options you used when you called vphaser; i.e. which of the following:
    Usage: vphaser2
    -i  [input.bam] -- input sorted bam file
    -o  [output DIR] -- output directory
    -e  [1 or 2] -- default 1; 1: pileup + phasing; 2: pileup
    -w  -- default 500; alignment window size
    -ig -- default 0; # of bases to ignore on both end of a read
    -delta  -- default 2; constrain PE distance by delta x fragsize_variation (auto measured by program)
    -ps (0, 100] -- default 30; percentage of reads to sample to get stats.
    -dt [0 or 1] -- default 1; 1: dinucleotide for err prob measure; 0: not
    -cy [0 or 1] -- default 1; 1: read cycle for err calibr; 0: not
    -mp [0 or 1] -- default 1; 1: mate-pair for err calibr; 0: not
    -qual [0, 40] -- default 20; quantile of qual for err calibr
    -a  -- default 0.05; significance value for stat test

    I'd guess the failures are a product of the options used, and/or something specific to the alignment. Did you specify the window size to be quite large, or does the .bam file have any very short reads (<<window size).

chukrello commented 7 years ago

Hi @tomkinsc,

Thank you for your answer! Yes, it was quite obvious, don't know why I didn't get it :) I use this comand:

OMP_NUM_THREADS=2 variant_caller -a 0.01 -i id.bam -o id

So there is no specifications on window size.

tomkinsc commented 7 years ago

Ok, I wonder if something is off about the read alignment coordinates. A few more comments:

  1. Are there any unmapped reads in the bam file? samtools view -c -f 4 input.bam

From the vphaser2 docs:

  1. Have you ensured each read/read-pair is aligned to only one reference genome? This can be achieved via samtools view -b -f 2 input.bam > output.bam
  2. Is the input bam file sorted by coordinate? samtools sort input.bam > output.bam
  3. In the same folder where the input bam file resides, there should not be any corresponding .bti files (index files used by Bamtools).
chukrello commented 7 years ago

Hi @tomkinsc!

Yes, there are quite big number of unmapped reads (~33000). Nevertheless, files who had successful Vphaser run also has same amount of unmapped reads.

I did all the comands (p.2, p.3 and p.4) and made a Vphaser run again. And I got this mistake one more time :(