cmks / DAS_Tool

DAS Tool
Other
140 stars 17 forks source link

Fasta_to_Contigs2Bin.sh will fail to create correct contig2bin file when inputs are gzipped #102

Open ElderMedic opened 9 months ago

ElderMedic commented 9 months ago

We are upgrading SemiBin binner to v2.0.2 from v1.5.x and the binning pipeline doesn't work, it fails at DAS tool step to evaluate bins from MetaBAT2, MaxBin and SemiBin2. Since SemiBin2 has changed the default output format to .fa.gz as below:

https://semibin.readthedocs.io/en/latest/subcommands/

--compression: Whether to compress outputs to save space. Should be one of none (default if using SemiBin) / gz (default if using SemiBin2) / xz / bz2.

Feeding gzipped fasta file to Fasta_to_Contigs2Bin.sh will cause false contig2bin file like this, with a term ' matches ' inserted between file name and bin id:

Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_0.fa.gz matches PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_0
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_1.fa.gz matches PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_1
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_11.fa.gz matches    PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_11
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_12.fa.gz matches    PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_12
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_17.fa.gz matches    PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_17
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_2.fa.gz matches PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_2
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_3.fa.gz matches PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_3
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_6.fa.gz matches PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_6
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_7.fa.gz matches PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_7

This will lead to Error: Contigs of contig2bin files not found in assembly: error of DAS tool :

DAS Tool 1.1.5 
2024-02-17 00:41:35 

Parameters: 
--bins  /var/lib/cwl/stgeeb5eb7f-b4cf-421a-b633-bf7fdfb64303/MetaBAT2_Contig2Bin.tsv,/var/lib/cwl/stg29178e7f-a75d-4a54-ba6a-314640298d30/MaxBin2_Contig2Bin.tsv,/var/lib/cwl/stgb4f6f652-c605-48ff-a600-75a749118655/SemiBin_Contig2Bin.tsv
--contigs   /var/lib/cwl/stgea745465-ae42-40a0-a09a-871eb848c139/PRJEB29504_ZYMO_TEST_0.33_repeat_2_scaffolds_pilon_polished.fasta
--outputbasename    PRJEB29504_ZYMO_TEST_0.33_repeat_2
--labels    MetaBAT2,MaxBin2,SemiBin
--search_engine diamond
--proteins  NULL
--write_bin_evals   FALSE
--write_bins    TRUE
--write_unbinned    TRUE
--threads   8
--score_threshold   0.5
--duplicate_penalty 0.6
--megabin_penalty   0.5
--dbDirectory   db
--resume    FALSE
--debug FALSE
--version   FALSE
--help  FALSE

Dependencies: 
prodigal    /venv/bin/prodigal
diamond /venv/bin/diamond
pullseq /venv/bin/pullseq
ruby    /venv/bin/ruby
usearch 
blastp  /venv/bin/blastp

Analyzing assembly 
Error: Contigs of contig2bin files not found in assembly: 9 
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_0.fa.gz matches,
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_1.fa.gz matches,
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_11.fa.gz matches,
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_12.fa.gz matches,
Binary file /var/lib/cwl/stg78c13a74-23d3-405f-be15-3ba3ffae1c85/output_bins/PRJEB29504_ZYMO_TEST_0.33_repeat_2_SemiBin_17.fa.gz matches,... 

Please support .fa.gz bins input if possible, would be very handy and helpful. Thx!