cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
40 stars 14 forks source link

IndexError: list index out of range #12

Closed WashingtondaSilva closed 10 years ago

WashingtondaSilva commented 10 years ago

I ran shorah.py in my data and got the following error. I also got a bunch of empty .fas files as my output. How can I resolve this issue? Thanks.

wld28@william:~/SG1/2$ python ~/shorah_0.8/shorah.py -b trimmed.bam.sorted.bam -f PVY.fa Traceback (most recent call last): File "/home/wld28/shorah_0.8/shorah.py", line 142, in keep_files=options.k, alpha=options.a) File "/data/home/wld28/shorah_0.8/dec.py", line 347, in main r = aligned_reads.keys()[0] IndexError: list index out of range wld28@william:~/SG1/2$

ozagordi commented 10 years ago

Hi. Did you read issue #4 and #5?

On Thursday, 6 November 2014, WashingtondaSilva notifications@github.com wrote:

I ran shorah.py in my data and got the following error. I also got a bunch of empty .fas files as my output. How can I resolve this issue? Thanks.

wld28@william:~/SG1/2$ python ~/shorah_0.8/shorah.py -b trimmed.bam.sorted.bam -f PVY.fa Traceback (most recent call last): File "/home/wld28/shorah_0.8/shorah.py", line 142, in keep_files=options.k, alpha=options.a) File "/data/home/wld28/shorah_0.8/dec.py", line 347, in main r = aligned_reads.keys()[0] IndexError: list index out of range wld28@william:~/SG1/2$

— Reply to this email directly or view it on GitHub https://github.com/ozagordi/shorah/issues/12.

Sent from a mobile device. Please excuse any typos.

WashingtondaSilva commented 10 years ago

Hi Dr. Ozagordi,

Thank you very much for the quick reply. It turned out that I had to reduce the window size to get the program running. However, I am still not able to get a good SNV calling, the SNV output files in the snv dir. are all empty.

I just need to do a SNV calling and get the frequency. I am working with PVY populations using Illumina NGS 100bp long reads. I am not so sure I am running the right commands. Can I just run the amplian.py or should I run the shorah.py on my data?

Cheers,

-Washington

ozagordi commented 10 years ago

amplian.py should be enough, and it seems that it did run the SNV analysis. Are you sure you have and SNV at all? Did you inspect the alignment? You can try with samtools tview.

On 7 November 2014 21:50, WashingtondaSilva notifications@github.com wrote:

Hi Dr. Ozagordi,

Thank you very much for the quick reply. It turned out that I had to reduce the window size to get the program running. However, I am still not able to get a good SNV calling, the SNV output files in the svn dir. are all empty.

I just need to do a SNV calling and get the frequency. I am working with PVY populations using Illumina NGS 100bp long reads. I am not so sure I am running the right commands. Can I just run the amplian.py or should I run the shorah.py on my data?

Cheers,

-Washington

— Reply to this email directly or view it on GitHub https://github.com/ozagordi/shorah/issues/12#issuecomment-62209726.

Ciao. Osvaldo

WashingtondaSilva commented 10 years ago

Thanks for the response Dr. Ozagordi,

I run VarsCan and also the bcftools on the same data and got lots of SNVs. I also checked the alignment in samtools tview and it seems to be correct. I will keep trying as I really would like to use Shorah in my studies.

###########

Here is my workflow, perhaps I am missing something.

ref fasta file is PVY.fa

Illumina reads file is trimmed.fastq

Building index

bowtie2-build PVY.fa PVY

Aligning sequences to the reference genome

bowtie2 -p 30 --sensitive -x PVY trimmed.fastq > trimmed.sam

Index ref fa

samtools faidx PVY.fa

SAM to BAM (b= only mapped reads)

samtools view -bS trimmed.sam > trimmed.bam

Sorting the BAM file

samtools sort trimmed.bam trimmed.bam.sorted

Creating the BAM index

samtools index trimmed.bam.sorted.bam

Generating mpileup file

samtools mpileup -f PVY.fa trimmed.bam.sorted.bam > PVY.mpileup

Running shorah

python ~/shorah_0.8/shorah.py -b trimmed.bam.sorted.bam -f PVY.fa -w 90 -s 1 -r PVY_O:25-9500

Thanks again,

Ciao,

-Washington

ozagordi commented 10 years ago

Hard to tell. First, just to be on the safe side I would avoid double dots in file names, although I don't think that's the problem. So samtools sort trimmed.bam trimmed_sorted will give you trimmed_sorted.bam.

That said, why are you giving the option -s 1? Try running without, and if you still have problems show me the log files.

WashingtondaSilva commented 10 years ago

Dear Dr. Ozagordi,

I did what you suggested and it seems like the running worked. I am just a little confused with the SNV.txt output header. Which frequency do I use? Frq1, 2 or 3? Again, for now, I just need the SNV freq and location of the SNVs in the genome for plotting and extract information. Here goes the head of the SNV.txt file on svn dir. I am assuming that is the file I need.

wld28@william:~/SG1/2/snv$ ls SNVs_0.010000_final.csv SNVs_0.010000.txt SNV.txt wld28@william:~/SG1/2/snv$ head SNV.txt Chromosome Pos Ref Var Frq1 Frq2 Frq3 Pst1 Pst2 Pst3 PVY_O 70 C T * 1.0000 1.0000 * 1.1026 1.2917 PVY_O 71 C A * 1.0000 1.0000 * 1.1026 1.2917 PVY_O 95 G T 1.0000 1.0000 1.0000 1.1026 1.2917 1.0077 PVY_O 96 G A 1.0000 1.0000 1.0000 1.1026 1.2917 1.0077 PVY_O 102 G A 1.0000 1.0000 1.0000 1.1026 1.2917 1.0077 PVY_O 134 G A 1.0000 1.0000 1.0000 1.2917 1.0077 1.1136 PVY_O 136 T C 1.0000 1.0000 1.0000 1.2917 1.0077 1.1136 PVY_O 160 C T 1.0000 1.0000 - 1.0077 1.1136 - PVY_O 218 T A - 1.0000 1.0000 - 0.9987 1.0000 wld28@william:~/SG1/2/snv$

Thank you very much for your prompt help.

Ciao,

-Washington

ozagordi commented 10 years ago

You should look at SNVs_...final.csv. The different frequencies refer to the fact that the same position is covered by different windows (up to three with default options), in your case they are all the same. A bit more puzzling the consistently high posterior (probabilities should be <= 1.0, it's a known issue). Run for longer and/or with higher alpha.

WashingtondaSilva commented 10 years ago

Great,

Thank you very much Dr. Ozagordi.

-Washington