Closed rotoscan closed 6 years ago
Just to add more information, I have run it again, this time on my local machine.
/home/administrator/storage/pipelines_mcgr/das_tool/DAS_Tool/DAS_Tool -i A_2_R_10-Empirical_abawaca_wrapup.tsv -l abawaca -c scaffold.fa -o testing -t 10
predicting genes using prodigal identifying single copy genes using usearch running DAS Tool with 10 threads evaluating bin-sets starting bin selection from 17 bins ||||||||||| (here it ends)
-rw-rw-r-- 1 administrator administrator 3,1K Jan 22 13:02 testing_abawaca.eval -rw-rw-r-- 1 administrator administrator 4,8K Jan 22 13:02 testing_DASTool_hqBins.pdf -rw-rw-r-- 1 administrator administrator 194K Jan 22 13:02 testing_DASTool_scaffolds2bin.txt -rw-rw-r-- 1 administrator administrator 17K Jan 22 13:02 testing_DASTool_scores.pdf -rw-rw-r-- 1 administrator administrator 2,0K Jan 22 13:02 testing_DASTool_summary.txt -rw-rw-r-- 1 administrator administrator 38M Jan 22 13:01 testing_proteins.faa -rw-rw-r-- 1 administrator administrator 21K Jan 22 13:02 testing_proteins.faa.archaea.scg -rw-rw-r-- 1 administrator administrator 43K Jan 22 13:01 testing_proteins.faa.bacteria.scg -rw-rw-r-- 1 administrator administrator 766K Jan 22 13:02 testing.seqlength
$ head A_2_R_10-Empirical_abawaca_wrapup.tsv scaffold_35 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_50 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_52 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_55 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta scaffold_84 /data/msb/rodolfo/mockc/MCs/illu_MCs/A_2_R_10-Empirical/bins/abawaca_bins/bins/final-clusters/10.fasta ...
$ grep 'scaff' scaffold.fa | head
scaffold_0 scaffold_1 scaffold_2
I hope I have attached enough information. I really don't know what is problem. Any help would be much appreciated.
Thank you very much, Rodolfo
Hi Rodolfo, Thanks for using DAS Tool! First of all, according to the command line output you provided, DAS Tool seems to be installed correctly and also runs without crashing.
However, you have to modify your input. In order to get good results, you have to provide outputs of at least three or more different binning tools. In both your commands you only provide one binning prediction. A modified version of your command involving the results of abawaca and maxbin could look like:
/home/brizolat/rodolfo/softwares/dastool/DAS_Tool/DAS_Tool -t 10 -i A_2_R_10-Empirical_abawaca_wrapup.tsv,/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins/dastool/A_1_O_60-Empirical_maxbin2_wrapup.tsv -l abawaca,maxbin2 -c /da
ta/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/assembly_out/contig.fa -o ./testing
In the scaffold2bin files it is not necessary to have the full path as bin name. To improve readability of your output, you could reformat A_2_R_10-Empirical_abawaca_wrapup.tsv to:
scaffold_35 Bin_10
scaffold_50 Bin_10
scaffold_52 Bin_10
scaffold_55 Bin_10
scaffold_84 Bin_10
An example of input and output file formats is in the DAS Tool installation directory and in this github repository in the sample_data
and sample_output
folders.
One last thing, if you write only a .
as output name like -o .
, all your output files will be set to invisible by your operating system as their name will be starting with the dot. So if you want to write the output in your current directory, use something like -o ./study_01
Thanks, Christian
Dear Christian,
thank you very much for your answer!
I was able to make things work by reformatting the second column of my .tsv file. The slashes ("/") were creating a confusion for the script, as it was understood as folder/subfolders. I have simplified and now it works (improving also readability). Thanks!
Also, regarding the .tsv files: ABAWACA generates bins based on scaffolds. The other binning tools, however, utilize contigs. How did you deal with that? The .tsv files for concoct, metabin and maxbin2 have contigs_* on the first column. This yields problems of headers when refering from scaffolds to contigs (and vice versa).
Thank you very much again. Best, Rodolfo
On 22/01/18 21:44, cmks notifications@github.com wrote:
Hi Rodolfo, Thanks for using DAS Tool! First of all, according to the command line output you provided, DAS Tool seems to be installed correctly and also runs without crashing. However, you have to modify your input. In order to get good results, you have to provide outputs of at least three or more different binning tools. In both your commands you only provide one binning prediction. A modified version of your command involving the results of abawaca and maxbin could look like:
/home/brizolat/rodolfo/softwares/dastool/DAS_Tool/DAS_Tool -t 10 -i A_2_R_10-Empirical_abawaca_wrapup.tsv,/data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins/dastool/A_1_O_60-Empirical_maxbin2_wrapup.tsv -l abawaca,maxbin2 -c /data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/assembly_out/contig.fa -o ./testing
In the scaffold2bin files it is not necessary to have the full path as bin name. To improve readability of your output, you could reformat A_2_R_10-Empirical_abawaca_wrapup.tsv to:
scaffold_35 Bin_10scaffold_50 Bin_10scaffold_52 Bin_10scaffold_55 Bin_10scaffold_84 Bin_10
An example of input and output file formats is in the DAS Tool installation directory and in this github repository in the sample_data and sample_output folders. One last thing, if you write only a . as output name like -o ., all your output files will be set to invisible by your operating system as their name will be starting with the dot. So if you want to write the output in your current directory, use something like -o ./study_01 Thanks, Christian
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/cmks/DAS_Tool/issues/17#issuecomment-359558388), or mute the thread(https://github.com/notifications/unsubscribe-auth/AfEAOvYvhyZpiJPfdhXo3S_LeH3lxcSOks5tNPMRgaJpZM4RkpkW).
-- Rodolfo Brizola Toscan Technician at Microbial Systems Bioinformatics Group Department of Environmental Microbiology Helmholtz Centre for Environmental Research - UFZ
Hi Rodolfo,
The binning tools you mentioned here and in https://github.com/cmks/DAS_Tool/issues/16 can handle scaffolds and contigs as input. It may be confusing that both terms are sometimes used interchangeably to describe an assembly (you can find more details about the definition of contigs/scaffolds here: https://genome.jgi.doe.gov/help/scaffolds.jsf)
Assemblers like Spades/Metaspades and IDBA_UD create contigs and scaffold files but it is advisable to only use one of them for downstream analysis. Also for being able to compare and combine binning results using DAS Tool, it is important to stick to either contigs or scaffolds. In your case you could repeat the ABAWACA run using contigs as input (or repeat concoct, metabin and maxbin2 using scaffolds) and the feed the result into DAS Tool.
Cheers, Christian
Hi Christian,
thank you very much for your reply!
I figured that I should stick to either scaffolds or contigs on the binning generation, for DAS tool needs a reference assembly multi fasta file later.
Anyway, as heads up for the other users: I wasn't able to generate bins out of contigs utilizing ABAWACA. The final clusters are simply empty (at least for contigs generated with IDBA-UD). Therefore I would advise the usage of scaffolds for every binning tool.
Thanks again!
Best, Rodolfo
On 30/01/18 22:01, cmks notifications@github.com wrote:
Hi Rodolfo, The binning tools you mentioned here and in #16(https://github.com/cmks/DAS_Tool/issues/16) can handle scaffolds and contigs as input. It may be confusing that both terms are sometimes used interchangeably to describe an assembly (you can find more details about the definition of contigs/scaffolds here: https://genome.jgi.doe.gov/help/scaffolds.jsf) Assemblers like Spades/Metaspades and IDBA_UD create contigs and scaffold files but it is advisable to only use one of them for downstream analysis. Also for being able to compare and combine binning results using DAS Tool, it is important to stick to either contigs or scaffolds. In your case you could repeat the ABAWACA run using contigs as input (or repeat concoct, metabin and maxbin2 using scaffolds) and the feed the result into DAS Tool. Cheers, Christian
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/cmks/DAS_Tool/issues/17#issuecomment-361732726), or mute the thread(https://github.com/notifications/unsubscribe-auth/AfEAOhbRs7RJO0gYgN2vhfMQ2B6SxIWnks5tP4MXgaJpZM4RkpkW).
-- Rodolfo Brizola Toscan Technician at Microbial Systems Bioinformatics Group Department of Environmental Microbiology Helmholtz Centre for Environmental Research - UFZ
Hello,
I am running DAS Tool but the following happens:
Begins: Fr 19. Jan 16:10:54 CET 2018
predicting genes using prodigal identifying single copy genes using usearch running DAS Tool with 20 threads evaluating bin-sets starting bin selection from 40 bins ||||||||||||||||||||||||||
Ends: Fr 19. Jan 16:15:34 CET 2018
My command: /home/brizolat/rodolfo/softwares/dastool/DAS_Tool/DAS_Tool -t ${NSLOTS:-1} -i /data/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/bins/dastool/A_1_O_60-Empirical_maxbin2_wrapup.tsv -l maxbin2 -c /da ta/msb/rodolfo/mockc/MCs/illu_MCs/A_1_O_60-Empirical/assembly_out/contig.fa -o .
Do you know what is happening?