epruesse / SINA

SINA - Reference based multiple sequence alignment
https://sina.readthedocs.io
GNU General Public License v3.0
40 stars 4 forks source link

Updating the version in Bioconda #69

Closed Finesim97 closed 5 years ago

Finesim97 commented 5 years ago

Hi, I am really excited to take the new search engine for a spin, but I saw that the Bioconda recipe isn't updated yet. I am sorry if this is the wrong repository for posting that. Is it enough to update the hash and version variable?

Best wishes for your easter weekend

Finesim97 commented 5 years ago

Just saw myself, that it is still a prerelease.

epruesse commented 5 years ago

@Finesim97 I indeed do need to update the package.

Any chance you can try briefly with the pre-release binaries I have on the releases package?

I was just waiting for some more people to run this on their computers. I've tested and tested, but I know from experience that no amount of testing "by the author" replaces "actual use in the field".

Finesim97 commented 5 years ago

Sure, I have my workflow running with the pre-release right now. The classification completes much faster than before. I will rerun it with the old version and compare the results.

epruesse commented 5 years ago

Thanks! Tell me if all is ok. I'll tag the 1.6.0 as soon as you do and update the Bioconda package.

epruesse commented 5 years ago

@Finesim97 any news?

Finesim97 commented 5 years ago

Yes, everything finished fine the second time I ran it. To test the performance, I added a repeat tag to my Snakemake workflow. During one of the repeats, SINA crashed with the following log:

18:14:38 [log] Loglevel set to info
18:14:38 [SINA] This is SINA 1.6.0-rc.1.
18:14:38 [libARBDB] ARB: no FastLoad File 'output/refdbs/LVA_132_SSURef_NR99.ARM' found => loading entire DB
18:14:46 [ARB I/O] Loading names map... (for "output/refdbs/LVA_132_SSURef_NR99.arb")
18:14:47 [Search (internal)] Index contains 695171 sequences (2688376 refs)
18:14:47 [alignment_stats] alignment stats for subset ssuref:archaea
18:14:47 [alignment_stats] weighted/unweighted columns = 1450/48550
18:14:47 [alignment_stats] average weight = 4.87686
18:14:47 [alignment_stats] minimum weight = 2.13805
18:14:47 [alignment_stats] maximum weight = 7.29296
18:14:47 [alignment_stats] ntaxa = 25025
18:14:47 [alignment_stats] base frequencies: na=0.243357 nu=0.320127 nc=0.238212 ng=0.198304
18:14:47 [alignment_stats] mutation frequencies: any=0.0212039 transversions=0.00822385
18:14:47 [alignment_stats] first/last weighted column=1005/43115
18:14:47 [alignment_stats] alignment stats for subset ssuref:bacteria
18:14:47 [alignment_stats] weighted/unweighted columns = 1532/48468
18:14:47 [alignment_stats] average weight = 5.2707
18:14:47 [alignment_stats] minimum weight = 2.03587
18:14:47 [alignment_stats] maximum weight = 7.67329
18:14:47 [alignment_stats] ntaxa = 592559
18:14:47 [alignment_stats] base frequencies: na=0.252611 nu=0.31454 nc=0.229764 ng=0.203085
18:14:47 [alignment_stats] mutation frequencies: any=0.0167975 transversions=0.00775564
18:14:47 [alignment_stats] first/last weighted column=1006/43241
18:14:47 [alignment_stats] alignment stats for subset ssuref:eukarya
18:14:47 [alignment_stats] weighted/unweighted columns = 1836/48164
18:14:47 [alignment_stats] average weight = 4.74158
18:14:47 [alignment_stats] minimum weight = 2.09547
18:14:47 [alignment_stats] maximum weight = 6.74351
18:14:47 [alignment_stats] ntaxa = 77585
18:14:47 [alignment_stats] base frequencies: na=0.257375 nu=0.268129 nc=0.211923 ng=0.262573
18:14:47 [alignment_stats] mutation frequencies: any=0.0292276 transversions=0.0148656
18:14:47 [alignment_stats] first/last weighted column=1006/43273
18:14:48 [SINA] Aligner ready. Processing sequences

-------------------- ARB-backtrace 'received signal 11':
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libCORE.so(GBK_dump_backtrace(_IO_FILE*, char const*)+0x26)[0x7f33c6645f36]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libCORE.so(+0xff34)[0x7f33c6647f34]
/lib/x86_64-linux-gnu/libc.so.6(+0x33060)[0x7f33c616e060]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libsina.so.0(std::__detail::_Map_base<std::thread::id, std::pair<std::thread::id const, sina::timer>, std::allocator<std::pair<std::thread::id const, sina::timer> >, std::__detail::_Select1st, std::equal_to<std::thread::id>, std::hash<std::thread::id>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](std::thread::id&&)+0x4d)[0x7f33c6d7ba3d]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libsina.so.0(sina::kmer_search::impl::find(sina::annotated_cseq const&, std::vector<sina::search::result_item, std::allocator<sina::search::result_item> >&, unsigned int)+0x33b)[0x7f33c6d75bcb]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libsina.so.0(sina::famfinder::impl::match(std::vector<sina::search::result_item, std::allocator<sina::search::result_item> >&, sina::annotated_cseq const&)+0x3f9)[0x7f33c6d16469]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libsina.so.0(sina::famfinder::impl::operator()(sina::tray)+0x13f8)[0x7f33c6d1b638]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libsina.so.0(sina::famfinder::operator()(sina::tray const&)+0x42)[0x7f33c6d1bed2]
sina-1.6.0-rc.1-linux/bin/sina(+0x269b2)[0x55b0ce3b59b2]
sina-1.6.0-rc.1-linux/bin/sina(+0x5003c)[0x55b0ce3df03c]
sina-1.6.0-rc.1-linux/bin/sina(+0x50131)[0x55b0ce3df131]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libtbb.so.2(+0x294a9)[0x7f33c6b7e4a9]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libtbb.so.2(+0x22af8)[0x7f33c6b77af8]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libtbb.so.2(+0x21384)[0x7f33c6b76384]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libtbb.so.2(+0x1d1e4)[0x7f33c6b721e4]
/nfs2/shared/lukas_jansen_research_data/mibiNGS/sina-1.6.0-rc.1-linux/bin/../lib/libtbb.so.2(+0x1d45a)[0x7f33c6b7245a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f33c5f054a4]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f33c6223d0f]
-------------------- End of backtrace
[Terminating with signal 11]
18:14:48 [ARB I/O] Closing ARB database '"output/refdbs/LVA_132_SSURef_NR99.arb"' ...

I wasn't able to reproduce it, as it finished multiple times without crashing. The files were on a NFS drive and I reused the generated index (from the same command).

My Snakemake rule:

# Prepare database:
rule sinaalignprep:
    input:
        db=rules.downlaodnrSSU.output
    output:
        outerdir+"/refdbs/LVA_132_SSURef_NR99.sidx"
    log:
        outerdir+"/logs/sina_silva_prep.log"
    conda: "envs/tooling.yml"
    shell:
        "echo \">Testing\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n\" |sina-1.6.0-rc.1-linux/bin/sina -r {input.db} --fs-engine internal > {log}"

# Align the OTUs to the silva nr db using SINA
rule sinaalign:
    input:
        rules.sinaalignprep.output,
        db=rules.downlaodnrSSU.output,
        toalign=rules.deblur.output.referencehitseqs, # Remember to change the BIOM file as well!,
    output:
        fasta=outdir+"/sina_silva_aligned_otus.fasta",
        csv=outdir+"/sina_silva_aligned_otus.csv"
    params:
        minsim=0.7
    log:
        outdir+"/logs/sina_silva.log"
    benchmark:
        repeat(outdir+"/benchmark/sina_silva.txt",5)
    conda: "envs/tooling.yml"
    threads: 32
    shell:
        "sina-1.6.0-rc.1-linux/bin/sina -i {input.toalign} -o {output.fasta} -r {input.db} -S --meta-fmt csv  -v --search-min-sim={params.minsim} --lca-fields tax_slv,tax_embl,tax_gg,tax_rdp,tax_gg -p {threads} --fs-engine internal > {log} 2>&1"

rule benchSina:
    input:
        expand(rules.sinaalign.benchmark,group=groups)
epruesse commented 5 years ago

Thanks! I'll try to look into that. At least it gave a stack trace - that helps a little. It's a concurrency thing, so really hard to reproduce. Looks like it happened in the timer code I use to figure out where SINA spends it's time to help me optimize the important bits. I might just take that out for the release binaries.

Cleaned trace:

std::__detail::_Map_base<std::thread::id, std::pair<std::thread::id const, sina::timer>>::operator[]
sina::kmer_search::impl::find()
sina::famfinder::impl::match(std::vector<sina::search::result_item>&, sina::annotated_cseq const&)
sina::famfinder::impl::operator()(sina::tray)
epruesse commented 5 years ago

Ok. I hope that's fixed. 1.6.0 is out.

Finesim97 commented 5 years ago

Thank you very much, glad I could help a little bit.

Am 27.04.2019 um 06:05 schrieb Elmar Pruesse notifications@github.com:

Closed #69.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

epruesse commented 5 years ago

Yes, you did. Thanks. I was able to fix the above problem and get it into the 1.6.0. Perhaps I can get away without a 1.6.1 again, but you never know.