cbg-ethz / shorah

Repo for the software suite ShoRAH (Short Reads Assembly into Haplotypes)
GNU General Public License v3.0
40 stars 14 forks source link

Applying ShoRAH to metagenomic samples #40

Closed yorickdevries closed 6 years ago

yorickdevries commented 6 years ago

I am investigating the applicability of ShoRAH for the estimation of genetic diversity in mixed metagenomic samples of bacteria. I have generated synthetic paired end illumina readset of a mixed E. coli culture and converted this to a sorted bam file. However, when I try to run ShoRAH with one of the strains fna file as reference the program crashes. The error is gives is "b2w run not succesful" I managed to get ShoRAH working on the testdata, however. Could you advise me on how I could get the program to work?

Properties of the bam file; 0.1 coverage of E_coli_536 and E_coli_C1 aligned against E_coli_536 and sorted according to http://cbg-ethz.github.io/shorah/input.html

Command I used to run ShoRAH; python shorah.py -b sortedReads.bam -f E_coli_536.fna -w 140 -s 5

ozagordi commented 6 years ago

Hi Yorick. Shorah is only able to work when you have high coverage and relatively high genetic diversity. It was developed initially on HIV, where you have high diversity even over a very small region (even at the level of a single read length). Under which conditions are you trying to run it?

yorickdevries commented 6 years ago

Ah, I assume the diversity between the strains in the sample is too low for Shorah to work. What do you mean with conditions? I have indicated the input in my first post.

ozagordi commented 6 years ago

Yorick, I was referring to coverage, length of the sequenced region, diversity and so on. I will close this for now.