cmks / DAS_Tool

DAS Tool
Other
136 stars 17 forks source link

Das Tool run suddenly ending with no output #17

Closed rotoscan closed 6 years ago

rotoscan commented 6 years ago

Hello,

I am running DAS Tool but the following happens:

Begins: Fr 19. Jan 16:10:54 CET 2018

predicting genes using prodigal identifying single copy genes using usearch running DAS Tool with 20 threads evaluating bin-sets starting bin selection from 40 bins ||||||||||||||||||||||||||

Ends: Fr 19. Jan 16:15:34 CET 2018

My command: /home/brizolat/rodolfo/softwares/dastool/DAS_Tool/DAS_Tool -t ${NSLOTS:-1} -i /data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins/dastool/A_1_O_60-Empirical_maxbin2_wrapup.tsv -l maxbin2 -c /da ta/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/assembly_out/contig.fa -o .

Do you know what is happening?

rotoscan commented 6 years ago

Just to add more information, I have run it again, this time on my local machine.

Here is my command:

/home/administrator/storage/pipelines_mcgr/das_tool/DAS_Tool/DAS_Tool -i A_2_R_10-Empirical_abawaca_wrapup.tsv -l abawaca -c scaffold.fa -o testing -t 10

Here is what happens:

predicting genes using prodigal identifying single copy genes using usearch running DAS Tool with 10 threads evaluating bin-sets starting bin selection from 17 bins ||||||||||| (here it ends)

what I get as generated files:

-rw-rw-r-- 1 administrator administrator 3,1K Jan 22 13:02 testing_abawaca.eval -rw-rw-r-- 1 administrator administrator 4,8K Jan 22 13:02 testing_DASTool_hqBins.pdf -rw-rw-r-- 1 administrator administrator 194K Jan 22 13:02 testing_DASTool_scaffolds2bin.txt -rw-rw-r-- 1 administrator administrator 17K Jan 22 13:02 testing_DASTool_scores.pdf -rw-rw-r-- 1 administrator administrator 2,0K Jan 22 13:02 testing_DASTool_summary.txt -rw-rw-r-- 1 administrator administrator 38M Jan 22 13:01 testing_proteins.faa -rw-rw-r-- 1 administrator administrator 21K Jan 22 13:02 testing_proteins.faa.archaea.scg -rw-rw-r-- 1 administrator administrator 43K Jan 22 13:01 testing_proteins.faa.bacteria.scg -rw-rw-r-- 1 administrator administrator 766K Jan 22 13:02 testing.seqlength

the structure of my input files:

$ head A_2_R_10-Empirical_abawaca_wrapup.tsv scaffold_35 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_50 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_52 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_55 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_84 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta ...

$ grep 'scaff' scaffold.fa | head

scaffold_0 scaffold_1 scaffold_2

I hope I have attached enough information. I really don't know what is problem. Any help would be much appreciated.

Thank you very much, Rodolfo

cmks commented 6 years ago

Hi Rodolfo, Thanks for using DAS Tool! First of all, according to the command line output you provided, DAS Tool seems to be installed correctly and also runs without crashing.

However, you have to modify your input. In order to get good results, you have to provide outputs of at least three or more different binning tools. In both your commands you only provide one binning prediction. A modified version of your command involving the results of abawaca and maxbin could look like:

/home/brizolat/rodolfo/softwares/dastool/DAS_Tool/DAS_Tool -t 10 -i A_2_R_10-Empirical_abawaca_wrapup.tsv,/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins/dastool/A_1_O_60-Empirical_maxbin2_wrapup.tsv -l abawaca,maxbin2 -c /da
ta/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/assembly_out/contig.fa -o ./testing

In the scaffold2bin files it is not necessary to have the full path as bin name. To improve readability of your output, you could reformat A_2_R_10-Empirical_abawaca_wrapup.tsv to:

scaffold_35 Bin_10
scaffold_50 Bin_10
scaffold_52 Bin_10
scaffold_55 Bin_10
scaffold_84 Bin_10

An example of input and output file formats is in the DAS Tool installation directory and in this github repository in the sample_data and sample_output folders.

One last thing, if you write only a . as output name like -o ., all your output files will be set to invisible by your operating system as their name will be starting with the dot. So if you want to write the output in your current directory, use something like -o ./study_01

Thanks, Christian

rotoscan commented 6 years ago

Dear Christian,

thank you very much for your answer!

I was able to make things work by reformatting the second column of my .tsv file. The slashes ("/") were creating a confusion for the script, as it was understood as folder/subfolders. I have simplified and now it works (improving also readability). Thanks!

Also, regarding the .tsv files: ABAWACA generates bins based on scaffolds. The other binning tools, however, utilize contigs. How did you deal with that? The .tsv files for concoct, metabin and maxbin2 have contigs_* on the first column. This yields problems of headers when refering from scaffolds to contigs (and vice versa).

Thank you very much again. Best, Rodolfo

On 22/01/18 21:44, cmks notifications@github.com wrote:

Hi Rodolfo, Thanks for using DAS Tool! First of all, according to the command line output you provided, DAS Tool seems to be installed correctly and also runs without crashing. However, you have to modify your input. In order to get good results, you have to provide outputs of at least three or more different binning tools. In both your commands you only provide one binning prediction. A modified version of your command involving the results of abawaca and maxbin could look like:

/home/brizolat/rodolfo/softwares/dastool/DAS_Tool/DAS_Tool -t 10 -i A_2_R_10-Empirical_abawaca_wrapup.tsv,/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins/dastool/A_1_O_60-Empirical_maxbin2_wrapup.tsv -l abawaca,maxbin2 -c /data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/assembly_out/contig.fa -o ./testing

In the scaffold2bin files it is not necessary to have the full path as bin name. To improve readability of your output, you could reformat A_2_R_10-Empirical_abawaca_wrapup.tsv to:

scaffold_35 Bin_10scaffold_50 Bin_10scaffold_52 Bin_10scaffold_55 Bin_10scaffold_84 Bin_10

An example of input and output file formats is in the DAS Tool installation directory and in this github repository in the sample_data and sample_output folders. One last thing, if you write only a . as output name like -o ., all your output files will be set to invisible by your operating system as their name will be starting with the dot. So if you want to write the output in your current directory, use something like -o ./study_01 Thanks, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/cmks/DAS_Tool/issues/17#issuecomment-359558388), or mute the thread(https://github.com/notifications/unsubscribe-auth/AfEAOvYvhyZpiJPfdhXo3S_LeH3lxcSOks5tNPMRgaJpZM4RkpkW).

-- Rodolfo Brizola Toscan Technician at Microbial Systems Bioinformatics Group Department of Environmental Microbiology Helmholtz Centre for Environmental Research - UFZ
cmks commented 6 years ago

Hi Rodolfo,

The binning tools you mentioned here and in https://github.com/cmks/DAS_Tool/issues/16 can handle scaffolds and contigs as input. It may be confusing that both terms are sometimes used interchangeably to describe an assembly (you can find more details about the definition of contigs/scaffolds here: https://genome.jgi.doe.gov/help/scaffolds.jsf)

Assemblers like Spades/Metaspades and IDBA_UD create contigs and scaffold files but it is advisable to only use one of them for downstream analysis. Also for being able to compare and combine binning results using DAS Tool, it is important to stick to either contigs or scaffolds. In your case you could repeat the ABAWACA run using contigs as input (or repeat concoct, metabin and maxbin2 using scaffolds) and the feed the result into DAS Tool.

Cheers, Christian

rotoscan commented 6 years ago

Hi Christian,

thank you very much for your reply!

I figured that I should stick to either scaffolds or contigs on the binning generation, for DAS tool needs a reference assembly multi fasta file later.

Anyway, as heads up for the other users: I wasn't able to generate bins out of contigs utilizing ABAWACA. The final clusters are simply empty (at least for contigs generated with IDBA-UD). Therefore I would advise the usage of scaffolds for every binning tool.

Thanks again!

Best, Rodolfo

On 30/01/18 22:01, cmks notifications@github.com wrote:

Hi Rodolfo, The binning tools you mentioned here and in #16(https://github.com/cmks/DAS_Tool/issues/16) can handle scaffolds and contigs as input. It may be confusing that both terms are sometimes used interchangeably to describe an assembly (you can find more details about the definition of contigs/scaffolds here: https://genome.jgi.doe.gov/help/scaffolds.jsf) Assemblers like Spades/Metaspades and IDBA_UD create contigs and scaffold files but it is advisable to only use one of them for downstream analysis. Also for being able to compare and combine binning results using DAS Tool, it is important to stick to either contigs or scaffolds. In your case you could repeat the ABAWACA run using contigs as input (or repeat concoct, metabin and maxbin2 using scaffolds) and the feed the result into DAS Tool. Cheers, Christian

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/cmks/DAS_Tool/issues/17#issuecomment-361732726), or mute the thread(https://github.com/notifications/unsubscribe-auth/AfEAOhbRs7RJO0gYgN2vhfMQ2B6SxIWnks5tP4MXgaJpZM4RkpkW).

-- Rodolfo Brizola Toscan Technician at Microbial Systems Bioinformatics Group Department of Environmental Microbiology Helmholtz Centre for Environmental Research - UFZ