cbg-ethz / haploclique

Viral quasispecies assembly via maximal clique finding. A method to reconstruct viral haplotypes and detect large insertions and deletions from NGS data.
GNU General Public License v3.0
25 stars 33 forks source link

What's new on 1.0 last release? #38

Closed seoanezonjic closed 6 years ago

seoanezonjic commented 7 years ago

I'm user of Haploclique a long time ago and I see that you have released a major version. I didn't found a changelog but I would like to known the new features of this version. Please, can you do a brief description? Thank you in advance

MaryamZaheri commented 7 years ago

Please let me know if you were using the master branch, because the difference is significant. Otherwise if you were using the development branch, the new version only fixes a few bugs and adds test cases.

seoanezonjic commented 7 years ago

Until the last month I have been using the Armin Topfer version repo. Now, writing a paper I check haploclique on github and I saw the cbg-ethz version (master branch). I would like to know what are the new features and fixed bugs in comparison with the Topfer version. By the way, I have certain problems. First, short flags not works for me and last, I have no way to set the number of threads on haploclique and it works very slowly with one thread. Thank you in advance

MaryamZaheri commented 7 years ago

Main differences:

There were no major bug fixing. The bugs fixed were not related to the algorithm but to validate input file. Furthermore, the current implementation is single thread.

seoanezonjic commented 7 years ago

Thank you by your detailed answer, Maryam. With the implementation changes that you describe I guess that time executions must be lower than with the previous version. But I have the following execution: haploclique --log=log_file --allel_frequencies=freq_file reads.bam The bam file is about 5MB (you can download it at : https://www.dropbox.com/s/rja3azyxqwvmy2v/reads.bam?dl=0 , I can't attach to this message) but the execution don't finish (it takes me days, literally). I don`t now where it is the problem. Maybe I must set the max_cliques parameter, so which value do you recommend? Also, which it is the biggest execution that you have done with haploclique (amount of reads or bam size) and how many time take the execution? Thanks in advance

MaryamZaheri commented 7 years ago

The default max_clique is 5000, you can also try less numbers. If you try short sequences (>2000 nc) with 200 to 500 coverage maybe you get the result faster. However, the execution time is high and we are optimizing the code to reduce it. Can you please share the allel_frequencies file also?

seoanezonjic commented 7 years ago

Sorry, I guessed that the allel_frequencies file was an optional output file not a input file. Maybe, this option freezes my haploclique executions? I'll try again without set this option and I'll post here the results.

seoanezonjic commented 7 years ago

I have performed several tests using your latest commit and the problem persists. The execution don't finish. Any recomendation to get results with this version of Haploclique? The command to execute haploclique was:

haploclique reads.bam

MaryamZaheri commented 7 years ago

Hi Pedro,

It is strange that you don't get result even with the default parameter. Can you send me ( maryam.zaheri@bsse.ethz.ch ) a sample of your data (reads.bam)?