The new version of this workflow is "completeness-aware":
Long contigs >500kb are identified and placed in individual fasta files.
They are then examined using CheckM2 to determine percent completeness.
All long contigs that are >93% complete are then moved directly to the final MAG set.
The long contigs that are <93% complete are pooled with other shorter incomplete contigs from the starting set, and this contig set is subjected to binning.
Binning algorithms include MetaBat2 and SemiBin2 (using long read settings).
The two bin sets are merged using DAS_Tool.
The dereplicated bin set consists of the merged bin set from above and all long complete contigs found.
This dereplicated bin set is examined using CheckM2, and subsequently filtered based on several qualities (defaults = >70% completeness, <10% contamination, <20 contigs).
All bins/MAGs passing filtering undergo taxonomic assignment using GTDB-Tk. The final MAGs are written as a set of fasta files, several figures are produced, and a summary file of metadata is generated.
The new "completeness-aware" strategy is highly effective at preventing improper binning of complete contigs.
It is more effective than the previous "circular-aware" binning used in v1.5 and v1.6.
Compared to a standard binning pipeline (e.g., MetaBat2), it results in a 14-67% increase in total MAGs (average 36%) and 13-186% increase in single contig MAGs (average 87%).
Compared to the "circular-aware" binning in v1.5, it results in a 14-39% increase in total MAGs (average 27%) and 10-28% increase in single contig MAGs (average 20%).
Beyond the "completeness-aware" strategy, there are several other important updates:
It now uses CheckM2 instead of CheckM, and no longer requires the manual download of the Checkm database.
For binning, Concoct and MaxBin2 have been retired, and SemiBin2 is used in conjunction with MetaBat2. SemiBin2 is highly effective at binning contigs from long-read assemblies and obtains better results.
This version also introduces checkpoints to create forked workflows depending on the properties of the sample, thereby preventing crashes when no bins pass filtering. This applies to the long contig completeness evaluation stage and the binning of incomplete contigs.
New figures are produced as part of the long contig evaluations and final summary steps.
HiFi-MAG-Pipeline received major improvements.
The new version of this workflow is "completeness-aware":
The new "completeness-aware" strategy is highly effective at preventing improper binning of complete contigs.
Beyond the "completeness-aware" strategy, there are several other important updates: