cmks / DAS_Tool

DAS Tool
Other
139 stars 17 forks source link

Effect of "Assembly" from pooled bins #85

Open Somebodyatthdoor opened 2 years ago

Somebodyatthdoor commented 2 years ago

Hi,

I have a slightly unusual problem, based on a misunderstanding by a collaborator on how das_tool works. I was wondering if you could give your opinion on what the result of the methods they have used might have been? We have done several analyses on the bins that were output from das_tool, and it would be a shame to throw all those analyses away if the effect of the mistake were minimal. But obviously if there may be a negative effect then we would rather know.

Method: 1) Bins were created from multiple samples using four different pipelines. 2) These bins were checked for quality using checkm, then anything with >5% contamination or <80% completion was discarded. 3) These bins were all used as input for das_tool. They originated from multiple assemblies. Instead of an assembly being used as the input for option -c, a fasta file was used which was a concatenation of all of the bin fasta files. 4) The das_tool output bins then went through another step where they were dereplicated using drep.

I am aware that this is not the usual way of running das_tool and that it is designed to use an assembly as the input fasta. However, I can't work out from the documentation whether any actual harm would come from doing this. We did actually see an improvement in the bins after running das_tool (see image below).

image

Thanks for your help, Laura

cmks commented 2 years ago

Hi Laura,

Based on what you've described, I don't see a big issue with your approach. You may have gotten more high quality bins if you'd skipped the filtering in step 2 and only filtered in the end during your dereplication step. DAS Tool works better on the full set of bins and is able to 'decontaminate' bins in certain cases. Step 3 is not a problem, because DAS Tool can implicitly handle multiple assemblies, as long as all contigs/bins have unique identifiers across assemblies/binning-pipelines.

I hope this is helpful.

Cheers, Christian

Somebodyatthdoor commented 2 years ago

Hi Christian,

Brilliant, thanks for the very quick reply.

Cheers, Laura


From: cmks @.> Sent: 15 September 2022 04:11 To: cmks/DAS_Tool @.> Cc: Laura Glendinning @.>; Author @.> Subject: Re: [cmks/DAS_Tool] Effect of "Assembly" from pooled bins (Issue #85)

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe.

Hi Laura,

Based on what you've described, I don't see a big issue with your approach. You may have gotten more high quality bins if you'd skipped the filtering in step 2 and only filtered in the end during your dereplication step. DAS Tool works better on the full set of bins and is able to 'decontaminate' bins in certain cases. Step 3 is not a problem, because DAS Tool can implicitly handle multiple assemblies, as long as all contigs/bins have unique identifiers across assemblies/binning-pipelines.

I hope this is helpful.

Cheers, Christian

— Reply to this email directly, view it on GitHubhttps://github.com/cmks/DAS_Tool/issues/85#issuecomment-1247526204, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBVBFNTPARSPCLX76N3MDLV6KHVLANCNFSM6AAAAAAQLNZWEI. You are receiving this because you authored the thread.Message ID: @.***>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.